Let’s consider a different approach utilizing the construct of a game to evaluate new technology and learn new skills.
Enter the arena
Battlesnake isn’t your indestructible Nokia candy bar CDMA phone snake game. This isn’t even an updated Google Snake spin off (but do try and get the secret rainbow snake), this is something very different and much more useful.
On the surface, Battlesnake seems like a simple game with a small number of basic rules:
Once you break through the basic premise, you’ll soon realize it is a lot more complicated than that.
There are many ways to build and place your own battlesnake into a competition. Depending on your team’s experience level you may want to try out one of the starter projects that Battlesnake makes available. Alternatively, you may want to start wading into the deeper end of the competitive pool and enhance your snake with health-based heuristics models or cannonball into the pool using a reinforcement learning approach.
The approach we took to our first competition was to hedge our bets a little – get something into competition quickly and gather some data to iterate on, then explore improvements on the initial snake performance through a series of ML model tweaks; ultimately building a reinforcement learning model that we were sure was going to win (in the most virtuous and collaborative sporting way of course). More on results later but here is walkthrough of how our architecture and development progressed:
Introduction to reinforcement learning
Reinforcement learning (often referred to as RL) has had a long history as a way to build AI models. From games like chess, Go and Starcraft II to more industry specific problems like manufacturing and supply chain optimization, reinforcement learning is being used to build best in class AI to tackle increasingly difficult challenges.
For those unfamiliar with RL, here is a quick primer:
- Traditionally, machine learning models learn to make predictions based on massive amounts of labeled example data. In RL, agents learn through experimentation..
- Each iteration is scored based on a reward function. As an example for Battlesnake, a basic set of rewards might be a 1 for winning and a -1 for losing.
- The rewards are fed into the model so that it “learns” which moves earn the highest reward in any given scenario. Similar to humans learning to not touch a hot stove, the model learns that running a snake head first into a wall will produce a negative reward and the model will remember not to do that (most of the time).
- For complex systems this reward structure might consist of dozens of different inputs that help to shape the reward based on the current state of the overall system.
Our team did not have a classically trained machine learning expert but we did have enough expertise to take some concepts that we learned from others who had attempted this approach and apply them using Google Cloud’s Vertex AI platform.
charmed trained our snake
One of the key starting areas for building a RL model is to set up an environment that knows how to play the game. OpenAI’s gym toolkit provides an easy way for developers to get started building RL models with a simple interface and many examples to start training your model quickly. This allows you to focus purely on the parts of the model that matter, like….