How can a data scientist get into game development?alejandro_vasile
One of the most interesting uses for data science is within the aspects and process around game development. While not immediately obvious that data science can be applicable to game development, it is increasingly becoming an enticing area both from a user engagement perspective, and as a source of data collection for deep learning and data science related tasks.
Games and data collection
With the increase of reinforcement learning oriented deep learning tasks in the past few years, the concept of using games as a method for collection of data (somewhat in parallel to collecting data on mturk or various other crowdsourcing platforms) has never been greater. The main idea behind data collection for these types of tasks is capturing the graphical display at some time and recording the user input for that image frame. From this data, it’s possible to connect these inputs into some end result (such that the final score) that can later be optimized and used as an objective cost function to be minimized or maximized.
With this, it’s possible to collect a large corpus of a user’s data for deep learning algorithms to initially train off of, which they can then use for the computer to play itself (something akin to this was done for AlphaGo and various other game related reinforcement learning bots). With the incredible influx of processing power now available, it’s possible for computers to play themselves thousands and millions of times to learn from themselves and their own shortcomings.
Deep learning uses
Practical uses of this type of deep learning that a data scientist may find interesting range from creating smart AI systems that are more engaging to a player, to finding transferable algorithms and data sources that can be used elsewhere. For example, many of the OpenAI algorithms are intended to be trained in one game with the hope that they will be transferable to another game and still do well (albeit with new parameters and learned cost function). This type of deep learning is incredibly interesting from a data scientist perspective because it is useful to not have to focus on highly optimizing each game or task that a data scientist may be working on and instead find commonalities and generalizable methodologies that translate across systems and games.
Many of the technical skills for creating pipelines of data collection from game development are much more development oriented than a traditional data scientist may be used to, and it may require learning new skills. These skills are much broader and encompassing of traditional developer roles, and initially include things such as data collection and data pipelining from the games, to scaling deep learning training and implementing new algorithms during training.These are becoming more vital to a data scientist as the need to both provide insight as well as create integrations into a product is becoming an incredibly vital skillset.
Exploring projects and tools
A data scientist may go about getting into this area by exploring such projects and tools such as OpenAI’s gym and Facebook’s MazeBase. These projects are very deep learning oriented though, and may not be what a traditional data scientist thinks of when they are interested in game development.
Data oriented/driven game design
Another approach is data oriented/driven game design. While this is not a new concept by any means, it has become increasingly ubiquitous as in-app purchasing and subscription based gaming plans have become a common theme among mobile and other gaming platforms. These types of data science tasks are not unlike normal data science related projects, in that they seek to understand from a statistical perspective what is happening to users at specific points along the games. There is a pretty big overlap in projects like this and projects that aim to understand, for instance, when a user abandons a cart during an online order. The data for the games may be oriented around when the gamer gave up on a quest, or at what point users are willing to make an in-app purchase to quicker achieve a goal. Since these are quantifiable and objective goals, they are an incredibly fit for traditional supervised learning tasks and can be approached with traditional supervised learning baselines and algorithms.
The end result of these tasks may include things such as making a quest or goal easier, or making an in-app purchase cheaper during some specific interval that the user would be more inclined to purchase (much like offering a user a coupon if a cart is abandoned during checkout often entices the user to come back and finish the purchase).
While both of these paths are game development oriented, they differ quite a lot in that one is much more traditionally data analytical, and one is much more deep learning engineering oriented. They both are highly interesting areas to explore from a professional standpoint, but data driven game development may be somewhat limited from a hobbyist standpoint outside of Kaggle competitions (which a quick search didn’t seem to show any previous competitions having this sort of data) since many companies would be quite hesitant to provide this sort of data if their entire business model is based around in-app purchases and recurring revenue from players.
Overall, these are both incredibly enticing areas and are great avenues to pursue and provide plenty of interesting problems that you may not encounter outside of game development.
About the Author
Graham Annett is an NLP Engineer at Kip (Kipthis.com). He has been interested in deep learning for a bit over a year and has worked with and contributed to Keras (https://github.com/fchollet/keras). He can be found on Github at http://github.com/grahamannett or via http://grahamannett.me.