Reinforcement Learning


Deep Reinforcement Learning With Unity ML Agents


After completing Andrew Ng's 11 week course on machine learning, including the math behind machine learning, and several projects in Matlab/Octave, I felt well equipped to branch out and create my own machine learning models with Unity's ML Agents framework. Unity's Machine Learning framework works with a combination of tensorflow or pytorch, Unity's ML Agent package, and a python api which communicates between the two.

There are three basic components to the Unity facing aspect of this framework:

1. Observations - the information the reinforcement model perceives about its environement. These observations may be values such as locations, velocities, and rotations. They can also be more abstract such as the RGB color values of each pixel of a picture, or even bools or enumerations. The number of these observations decides the number of input neurons for the neural network of the model.

2.Actions - all the given actions the model can take and correlates to the output neurons of the neural network. For example, the outputs for my final model are for movement, rotation, and firing the gun.

3. Reward Signals - The model tries to maximize it's reward and you can therefore shape its behavior by adding a reward for desired behavior and subtract rewards for undesired behavior.

Creating a decent Reward structure for my model was one of the more challenging aspects of this project. It seems that best practice is to normalize rewards on a scale of -1 to 1. There are many interesting tactics with rewards to help shape the behavior of reinforcement learning models. For example, if you add a very small negative reward over time, such as -0.02 per second, the model is incentivized to speed up its actions so that reward is not diminished over time as much. I found a good balence for this project was a reward of +0.25 for every succesfully shot enemy, +1 for collecting a red cube, -1 for letting an enemy touch you, -1 for shooting the gun and missing, and about -.001 per frame that the model is running to help speed him up.

It was extremely interesting to see how rewards shape the behavior of reinforcement learning models. It can also be quite frustrating because the models are very good at optimizing their rewards in ways that do not produce your desired outcome. I found that it is generally best practice to only reward end goals and not micro behaviors that you think will lead to completing a goal. For example, I only reward the agent for successfully shooting an enemy, but I do not reward him for rotating to face towards an enemy. This micro behavior of rotating to face enemies is needed for the model to shoot the enemies, and I origionally tried rewarding this behavior as I thought it would be needed, but it ended up leading to some wacky results where the model prioritized aiming towards enemies over anything else. So, rewarding completed goals over behaviors was a key takaway from this project and seems to align with what I've seen online for structuring reward models.