Introduction to AIAhren Stevens-Taylor
(For more resources related to this topic, see here.)
Artificial Intelligence (AI)
Living organisms such as animals and humans have some sort of intelligence that helps us in making a particular decision to perform something. On the other hand, computers are just electronic devices that can accept data, perform logical and mathematical operations at high speeds, and output the results. So, Artificial Intelligence (AI) is essentially the subject of making computers able to think and decide like living organisms to perform specific operations.
So, apparently this is a huge subject. But it is really important to understand the basics of AI being used in different domains. AI is just a general term; its implementations and applications are different for different purposes, solving different sets of problems.
Before we move on to game-specific techniques, we’ll take a look at the following research areas in AI applications:
- Computer vision: It is the ability to take visual input from sources such as videos and cameras, and analyze them to do particular operations such as facial recognition, object recognition, and optical-character recognition.
- Natural language processing (NLP): It is the ability that allows a machine to read and understand the languages, as we normally write and speak. The problem is that the languages we use today are difficult for machines to understand. There are many different ways to say the same thing, and the same sentence can have different meanings according to the context. NLP is an important step for machines, since they need to understand the languages and expressions we use, before they can process them and respond accordingly. Fortunately, there’s an enormous amount of data sets available on the Web that can help researchers to do automatic analysis of a language.
- Common sense reasoning: This is a technique that our brains can easily use to draw answers even from the domains we don’t fully understand. Common sense knowledge is a usual and common way for us to attempt certain questions, since our brains can mix and interplay between the context, background knowledge, and language proficiency. But making machines to apply such knowledge is very complex, and still a major challenge for researchers.
AI in games
Game AI needs to complement the quality of a game. For that we need to understand the fundamental requirement that every game must have. The answer should be easy. It is the fun factor. So, what makes a game fun to play? This is the subject of game design, and a good reference is The Art of Game Design by Jesse Schell. Let’s attempt to tackle this question without going deep into game design topics. We’ll find that a challenging game is indeed fun to play. Let me repeat: it’s about making a game challenging. This means the game should not be so difficult that it’s impossible for the player to beat the opponent, or too easy to win. Finding the right challenge level is the key to make a game fun to play.
And that’s where the AI kicks in. The role of AI in games is to make it fun by providing challenging opponents to compete, and interesting non-player characters (NPCs) that behave realistically inside the game world. So, the objective here is not to replicate the whole thought process of humans or animals, but to make the NPCs seem intelligent by reacting to the changing situations inside the game world in a way that makes sense to the player.
The reason that we don’t want to make the AI system in games so computationally expensive is that the processing power required for AI calculations needs to be shared between other operations such as graphic rendering and physics simulation. Also, don’t forget that they are all happening in real time, and it’s also really important to achieve a steady framerate throughout the game. There were even attempts to create dedicated processor for AI calculations (AI Seek’s Intia Processor). With the ever-increasing processing power, we now have more and more room for AI calculations. However, like all the other disciplines in game development, optimizing AI calculations remains a huge challenge for the AI developers.
In this section, we’ll walk through some of the AI techniques being used in different types of games. So, let’s just take it as a crash course, before actually going into implementation. If you want to learn more about AI for games, there are some really great books out there, such as Programming Game AI by Example by Mat Buckland and Artificial Intelligence for Games by Ian Millington and John Funge. The AI Game Programming Wisdom series also contain a lot of useful resources and articles on the latest AI techniques.
Finite State Machines (FSM)
Finite State Machines (FSM) can be considered as one of the simplest AI model form, and are commonly used in the majority of games. A state machine basically consists of a finite number of states that are connected in a graph by the transitions between them. A game entity starts with an initial state, and then looks out for the events and rules that will trigger a transition to another state. A game entity can only be in exactly one state at any given time.
For example, let’s take a look at an AI guard character in a typical shooting game. Its states could be as simple as patrolling, chasing, and shooting.
Simple FSM of an AI guard character
There are basically four components in a simple FSM:
- States: This component defines a set of states that a game entity or an NPC can choose from (patrol, chase, and shoot)
- Transitions: This component defines relations between different states
- Rules: This component is used to trigger a state transition (player on sight, close enough to attack, and lost/killed player)
- Events: This is the component, which will trigger to check the rules (guard’s visible area, distance with the player, and so on)
So, a monster in Quake 2 might have the following states: standing, walking, running, dodging, attacking, idle, and searching.
FSMs are widely used in game AI especially, because they are really easy to implement and more than enough for both simple and somewhat complex games. Using simple if/else statements or switch statements, we can easily implement an FSM. It can get messy, as we start to have more states and more transitions.
Random and probability in AI
Imagine an enemy bot in an FPS game that can always kill the player with a headshot, an opponent in a racing game that always chooses the best route, and overtakes without collision with any obstacle. Such a level of intelligence will make the game so difficult that it becomes almost impossible to win. On the other hand, imagine an AI enemy that always chooses the same route to follow, or tries to escape from the player. AI controlled entities behaving the same way every time the player encounters them, makes the game predictable and easy to win.
Both of the previous situations obviously affect the fun aspect of the game, and make the player feel like the game is not challenging or fair enough anymore. One way to fix this sort of perfect AI and stupid AI is to introduce some errors in their intelligence. In games, randomness and probabilities are applied in the decision making process of AI calculations. The following are the main situations when we would want to let our AI entities change a random decision:
- Non-intentional: This situation is sometimes a game agent, or perhaps an NPC might need to make a decision randomly, just because it doesn’t have enough information to make a perfect decision, and/or it doesn’t really matter what decision it makes. Simply making a decision randomly and hoping for the best result is the way to go in such a situation.
- Intentional: This situation is for perfect AI and stupid AI. As we discussed in the previous examples, we will need to add some randomness purposely, just to make them more realistic, and also to match the difficulty level that the player is comfortable with. Such randomness and probability could be used for things such as hit probabilities, plus or minus random damage on top of base damage. Using randomness and probability we can add a sense of realistic uncertainty to our game and make our AI system somewhat unpredictable.
We can also use probability to define different classes of AI characters. Let’s look at the hero characters from Defense of the Ancient (DotA), which is a popular action real-time strategy (RTS) game mode of Warcraft III. There are three categories of heroes based on the three main attributes: strength, intelligence, and agility. Strength is the measure of the physical power of the hero, while intellect relates to how well the hero can control spells and magic. Agility defines a hero’s ability to avoid attacks and attack quickly. An AI hero from the strength category will have the ability to do more damage during close combat, while an intelligence hero will have more chance of success to score higher damage using spells and magic. Carefully balancing the randomness and probability between different classes and heroes, makes the game a lot more challenging, and makes DotA a lot fun to play.
The sensor system
Our AI characters need to know about their surroundings, and the world they are interacting with, in order to make a particular decision. Such information could be as follows:
- Position of the player: This information is used to decide whether to attack or chase, or keep patrolling
- Buildings and objects nearby: This information is used to hide or take cover
- Player’s health and its own health: This remaining information is used to decide whether to retreat or advance
- Location of resources on the map in an RTS game: This information is used to occupy and collect resources, required for constructing and producing other units
As you can see, it could vary a lot depending on the type of game we are trying to build. So, how do we collect that information?
One method to collect such information is polling. We can simply do if/else or switch checks in the FixedUpdate method of our AI character. AI character just polls the information they are interested in from the game world, does the checks, and takes action accordingly. Polling methods works great, if there aren’t too many things to check. However, some characters might not need to poll the world states every frame. Different characters might require different polling rates. So, usually in larger games with more complex AI systems, we need to deploy an event-driven method using a global messaging system.
The messaging system
AI does decision making in response to the events in the world. The events are communicated between the AI entity and the player, the world, or the other AI entities through a messaging system. For example, when the player attacks an enemy unit from a group of patrol guards, the other AI units need to know about this incident as well, so that they can start searching for and attacking the player. If we were using the polling method, our AI entities will need to check the state of all the other AI entities, in order to know about this incident. But with an event-driven messaging system, we can implement this in a more manageable and scalable way. The AI characters interested in a particular event can be registered as listeners, and if that event happens, our messaging system will broadcast to all listeners. The AI entities can then proceed to take appropriate actions, or perform further checks.
The event-driven system does not necessarily provide faster mechanism than polling. But it provides a convenient, central checking system that senses the world and informs the interested AI agents, rather than each individual agent having to check the same event in every frame. In reality, both polling and messaging system are used together most of the time. For example, AI might poll for more detailed information when it receives an event from the messaging system.
Flocking, swarming, and herding
Many living beings such as birds, fish, insects, and land animals perform certain operations such as moving, hunting, and foraging in groups. They stay and hunt in groups, because it makes them stronger and safer from predators than pursuing goals individually. So, let’s say you want a group of birds flocking, swarming around in the sky; it’ll cost too much time and effort for animators to design the movement and animations of each bird. But if we apply some simple rules for each bird to follow, we can achieve emergent intelligence of the whole group with complex, global behavior.
One pioneer of this concept is Craig Reynolds, who presented such a flocking algorithm in his SIGGRAPH paper, 1987, Flocks, Herds and Schools – A Distributed Behavioral Model. He coined the term “boid” that sounds like “bird”, but referring to a “bird-like” object. He proposed three simple rules to apply to each unit, which are as follows:
- Separation: This rule is used to maintain a minimum distance with neighboring boids to avoid hitting them
- Alignment: This rule is used to align itself with the average direction of its neighbors, and then move in the same velocity with them as a flock
- Cohesion: This step is used to maintain a minimum distance with the group’s center of mass
These three simple rules are all that we need to implement a realistic and a fairly complex flocking behavior for birds. They can also be applied to group behaviors of any other entity type with little or no modifications.
Path following and steering
Sometimes we want our AI characters to roam around in the game world, following a roughly guided or thoroughly defined path. For example in a racing game, the AI opponents need to navigate on the road. And the decision-making algorithms such as our flocking boid algorithm discussed already, can only do well in making decisions. But in the end, it all comes down to dealing with actual movements and steering behaviors. Steering behaviors for AI characters have been in research topics for a couple of decades now. One notable paper in this field is Steering Behaviors for Autonomous Characters, again by Craig Reynolds, presented in 1999 at the Game Developers Conference (GDC). He categorized steering behaviors into the following three layers:
Hierarchy of motion behaviors
Let me quote the original example from his paper to understand these three layers:
“Consider, for example, some cowboys tending a herd of cattle out on the range. A cow wanders away from the herd. The trail boss tells a cowboy to fetch the stray. The cowboy says “giddy-up” to his horse, and guides it to the cow, possibly avoiding obstacles along the way. In this example, the trail boss represents action selection, noticing that the state of the world has changed (a cow left the herd), and setting a goal (retrieve the stray). The steering level is represented by the cowboy who decomposes the goal into a series of simple sub goals (approach the cow, avoid obstacles, and retrieve the cow). A sub goal corresponds to a steering behavior for the cowboy-and-horse team. Using various control signals (vocal commands, spurs, and reins), the cowboy steers his horse towards the target. In general terms, these signals express concepts like go faster, go slower, turn right, turn left, and so on. The horse implements the locomotion level. Taking the cowboy’s control signals as input, the horse moves in the indicated direction. This motion is the result of a complex interaction of the horse’s visual perception, its sense of balance, and its muscles applying torques to the joints of its skeleton.”
Then he presented how to design and implement some common and simple steering behaviors for individual AI characters and pairs. Such behaviors include seek and flee, pursue and evade, wander, arrival, obstacle avoidance, wall following, and path following.
There are many games where you can find monsters or enemies that follow the player, or go to a particular point while avoiding obstacles. For example, let’s take a look at a typical RTS game. You can select a group of units and click a location where you want them to move or click on the enemy units to attack them. Your units then need to find a way to reach the goal without colliding with the obstacles. The enemy units also need to be able to do the same. Obstacles could be different for different units. For example, an air force unit might be able to pass over a mountain, while the ground or artillery units need to find a way around it.
A* (pronounced “A star”) is a pathfinding algorithm widely used in games, because of its performance and accuracy. Let’s take a look at an example to see how it works. Let’s say we want our unit to move from point A to point B, but there’s a wall in the way, and it can’t go straight towards the target. So, it needs to find a way to point B while avoiding the wall.
Top-down view of our map
We are looking at a simple 2D example. But the same idea can be applied to 3D environments. In order to find the path from point A to point B, we need to know more about the map such as the position of obstacles. For that we can split our whole map into small tiles, representing the whole map in a grid format, as shown in the following figure:
Map represented in a 2D grid
The tiles can also be of other shapes such as hexagons and triangles. But we’ll just use square tiles here, as that’s quite simple and enough for our scenario. Representing the whole map in a grid, makes the search area more simplified, and this is an important step in pathfinding. We can now reference our map in a small 2D array.
Our map is now represented by a 5 x 5 grid of square tiles with a total of 25 tiles. We can start searching for the best path to reach the target. How do we do this? By calculating the movement score of each tile adjacent to the starting tile, which is a tile on the map not occupied by an obstacle, and then choosing the tile with the lowest cost.
There are four possible adjacent tiles to the player, if we don’t consider the diagonal movements. Now, we need to know two numbers to calculate the movement score for each of those tiles. Let’s call them G and H, where G is the cost of movement from starting tile to current tile, and H is the cost to reach the target tile from current tile.
By adding G and H, we can get the final score of that tile; let’s call it F. So we’ll be using this formula: F = G + H.
Valid adjacent tiles
In this example, we’ll be using a simple method called Manhattan length (also known as Taxicab geometry), in which we just count the total number of tiles between the starting tile and the target tile to know the distance between them.
The preceding figure shows the calculations of G with two different paths. We just add one (which is the cost to move one tile) to the previous tile’s G score to get the current G score of the current tile. We can give different costs to different tiles. For example, we might want to give a higher movement cost for diagonal movements (if we are considering them), or to specific tiles occupied by, let’s say a pond or a muddy road. Now we know how to get G. Let’s look at the calculation of H. The following figure shows different H values from different starting tiles to the target tile. You can try counting the squares between them to understand how we get those values.
So, now we know how to get G and H. Let’s go back to our original example to figure out the shortest path from A to B. We first choose the starting tile, and then determine the valid adjacent tiles, as shown in the following figure. Then we calculate the G and H scores of each tile, shown in the lower-left and right corners of the tile respectively. And then the final score F, which is G + H is shown at the top-left corner. Obviously, the tile to the immediate right of the start tile has got the lowest F score.
So, we choose this tile as our next movement, and store the previous tile as its parent. This parent stuff will be useful later, when we trace back our final path.
From the current tile, we do the similar process again, determining valid adjacent tiles. This time there are only two valid adjacent tiles at the top and bottom. The left tile is a starting tile, which we’ve already examined, and the obstacle occupies the right tile. We calculate the G, the H, and then the F score of those new adjacent tiles. This time we have four tiles on our map with all having the same score, six. So, which one do we choose? We can choose any of them. It doesn’t really matter in this example, because we’ll eventually find the shortest path with whichever tile we choose, if they have the same score. Usually, we just choose the tile added most recently to our adjacent list. This is because later we’ll be using some sort of data structure, such as a list to store those tiles that are being considered for the next move. So, accessing the tile most recently added to that list could be faster than searching through the list to reach a particular tile that was added previously.
In this demo, we’ll just randomly choose the tile for our next test, just to prove that it can actually find the shortest path.
So, we choose this tile, which is highlighted with a red border. Again we examine the adjacent tiles. In this step, there’s only one new adjacent tile with a calculated F score of 8. So, the lowest score right now is still 6. We can choose any tile with the score 6.
So, we choose a tile randomly from all the tiles with the score 6. If we repeat this process until we reach our target tile, we’ll end up with a board complete with all the scores for each valid tile.
Now all we have to do is to trace back starting from the target tile using its parent tile. This will give a path that looks something like the following figure:
Path traced back
So this is the concept of A* pathfinding in a nutshell, without displaying any code. A* is an important concept in the AI pathfinding area, but since Unity 3.5, there are a couple of new features such as automatic navigation mesh generation and the Nav Mesh Agent, which we’ll see roughly in the next section and then in more detail later. These features make implementing pathfinding in your games very much easier. In fact, you may not even need to know about A* to implement pathfinding for your AI characters. Nonetheless, knowing how the system is actually working behind the scenes will help you to become a solid AI programmer. Unfortunately, those advanced navigation features in Unity are only available in the Pro version at this moment.
A navigation mesh
Now we have some idea of A* pathfinding techniques. One thing that you might notice is that using a simple grid in A* requires quite a number of computations to get a path which is the shortest to the target, and at the same time avoids the obstacles. So, to make it cheaper and easier for AI characters to find a path, people came up with the idea of using waypoints as a guide to move AI characters from the start point to the target point. Let’s say we want to move our AI character from point A to point B, and we’ve set up three waypoints as shown in the following figure:
All we have to do now is to pick up the nearest waypoint, and then follow its connected node leading to the target waypoint. Most of the games use waypoints for pathfinding, because they are simple and quite effective in using less computation resources. However, they do have some issues. What if we want to update the obstacles in our map? We’ll also have to place waypoints for the updated map again, as shown in the following figure:
Following each node to the target can mean the AI character moves in zigzag directions. Look at the preceding figures; it’s quite likely that the AI character will collide with the wall where the path is close to the wall. If that happens, our AI will keep trying to go through the wall to reach the next target, but it won’t be able to and it will get stuck there. Even though we can smooth out the zigzag path by transforming it to a spline and do some adjustments to avoid such obstacles, the problem is the waypoints don’t give any information about the environment, other than the spline connected between two nodes. What if our smoothed and adjusted path passes the edge of a cliff or a bridge? The new path might not be a safe path anymore. So, for our AI entities to be able to effectively traverse the whole level, we’re going to need a tremendous number of waypoints, which will be really hard to implement and manage.
Let’s look at a better solution, navigation mesh. A navigation mesh is another graph structure that can be used to represent our world, similar to the way we did with our square tile-based grid or waypoints graph.
A navigation mesh uses convex polygons to represent the areas in the map that an AI entity can travel. The most important benefit of using a navigation mesh is that it gives a lot more information about the environment than a waypoint system. Now we can adjust our path safely, because we know the safe region in which our AI entities can travel. Another advantage of using a navigation mesh is that we can use the same mesh for different types of AI entities. Different AI entities can have different properties such as size, speed, and movement abilities. A set of waypoints is tailored for human, AI may not work nicely for flying creatures or AI controlled vehicles. Those might need different sets of waypoints. Using a navigation mesh can save a lot of time in such cases.
But generating a navigation mesh programmatically based on a scene, is a somewhat complicated process. Fortunately, Unity 3.5 introduced a built-in navigation mesh generator (Pro only feature). Instead, we’ll learn how to use Unity’s navigation mesh for generating features to easily implement our AI pathfinding.
The behavior trees
Behavior trees are the other techniques used to represent and control the logic behind AI characters. They have become popular for the applications in AAA games such as Halo and Spore. Previously, we have briefly covered FSM. FSMs provide a very simple way to define the logic of an AI character, based on the different states and transitions between them. However, FSMs are considered difficult to scale and re-use existing logic. We need to add many states and hard-wire many transitions, in order to support all the scenarios, which we want our AI character to consider. So, we need a more scalable approach when dealing with large problems. behavior trees are a better way to implement AI game characters that could potentially become more and more complex.
The basic elements of behavior trees are tasks, where states are the main elements for FSMs. There are a few different tasks such as Sequence, Selector, and Parallel Decorator. This is quite confusing. The best way to understand this is to look at an example. Let’s try to translate our example from the FSM section using a behavior tree. We can break all the transitions and states into tasks.
Let’s look at a Selector task for this Behavior tree. Selector tasks are represented with a circle and a question mark inside. First it’ll choose to attack the player. If the Attack task returns success, the Selector task is done and will go back to the parent node, if there is one. If the Attack task fails, it’ll try the Chase task. If the Chase task fails, it’ll try the Patrol task.
What about the tests? They are also one of the tasks in the behavior trees. The following diagram shows the use of Sequence tasks, denoted by a rectangle with an arrow inside it. The root selector may choose the first Sequence action. This Sequence action’s first task is to check whether the player character is close enough to attack. If this task succeeds, it’ll proceed with the next task, which is to attack the player. If the Attack task also returns success, the whole sequence will return success, and the selector is done with this behavior, and will not continue with other Sequence tasks. If the Close enough to attack? task fails, then the Sequence action will not proceed to the Attack task, and will return a failed status to the parent selector task. Then the selector will choose the next task in the sequence, Lost or Killed Player?.
The other two common components are Parallel and Decorator. A Parallel task will execute all of its child tasks at the same time, while the Sequence and Selector tasks only execute their child tasks one by one. Decorator is another type of task that has only one child. It can change the behavior of its own child’s tasks, which includes whether to run its child’s task or not, how many times it should run, and so on.
We’ll study how to implement a basic behavior tree system later. There’s a free add-on for Unity called Behave in the Unity Asset Store. Behave is a useful, free GUI editor to set up behavior trees of AI characters, and we’ll look at it in more detail later as well.
Animals (including humans) have a very complex musculoskeletal system (the locomotor system) that gives them the ability to move around the body using the muscular and skeletal systems. We know where to put our steps when climbing a ladder, stairs, or on uneven terrain, and we know how to balance our body to stabilize all the fancy poses we want to make. We can do all this using our bones, muscles, joints, and other tissues, collectively described as our locomotor system.
Now put that into our game development perspective. Let’s say we’ve a human character who needs to walk on both even and uneven surfaces, or on small slopes, and we have only one animation for a “walk” cycle. With the lack of a locomotor system in our virtual character, this is how it would look:
Climbing stair without locomotion
First we play the walk animation and advance the player forward. Now the character knows it’s penetrating the surface. So, the collision detection system will pull the character up above the surface to prevent this penetration. This is how we usually set up the movement on an uneven surface. Even though it doesn’t give a realistic look and feel, it does the job and is cheap to implement.
Let’s take a look at how we really walk up stairs. We put our step firmly on the staircase, and using this force we pull up the rest of our body for the next step. This is how we do it in real life with our advanced locomotor system. However, it’s not so simple to implement this level of realism inside games. We’ll need a lot of animations for different scenarios, which include climbing ladders, walking/running up stairs, and so on. So, only the large studios with a lot of animators could pull this off in the past, until we came up with an automated system.
With a locomotion system
Fortunately, Unity 3D has an extension that can do just that, which is a locomotion system.
Locomotion system Unity extension
This system can automatically blend our animated walk/run cycles, and adjust the movements of the bones in the legs to ensure that the feet step correctly on the ground. It can also adjust the original animations made for a specific speed and direction on any surface, arbitrary steps, and slopes. We’ll see how to use this locomotion system to apply realistic movement to our AI characters.
The Dijkstra’s algorithm, named after professor Edsger Dijkstra, who devised the algorithm, is one of the most famous algorithms for finding the shortest paths in a graph with non-negative edge path costs. The algorithm was originally designed to solve the shortest path problem in the context of mathematical graph theory. And it’s designed to find all the shortest paths from a starting node to all the other nodes in the graph. Since most of the games only need the shortest path between one starting point and one target point, all the other paths generated or found by this algorithm are not really useful. We can stop the algorithm, once we find the shortest path from a single starting point to a target point. But still it’ll try to find all the shortest paths from all the points it has visited. So, this algorithm is not efficient enough to be used in most games. And we won’t be doing a Unity demo of Dijkstra’s algorithm in this article as well.
However, Dijkstra’s algorithm is an important algorithm for the games that require strategic AI that needs as much information as possible about the map to make tactical decisions. It has many applications other than games, such as finding the shortest path in network routing protocols.
Game AI and academic AI have different objectives. Academic AI researches try to solve real-world problems, and prove a theory without much limited resources. Game AI focuses on building NPCs within limited resources that seems to be intelligent to the player. Objective of AI in games is to provide a challenging opponent that makes the game more fun to play with. We also learned briefly about the different AI techniques that are widely used in games such as finite state machines (FSMs), random and probability, sensor and input system, flocking and group behaviors, path following and steering behaviors, AI path finding, navigation mesh generation, and behavior trees.
Resources for Article:
- Unity 3-0 Enter the Third Dimension [Article]
- Introduction to Game Development Using Unity 3D [Article]
- Unity Game Development: Welcome to the 3D world [Article]