For each round, the scores at the bottom of the board are different. WATCH: How agent performs in “real world”Īs you might have seen in the video, the game is played in four rounds. Watching it is strongly recommend in order to get a better understanding of the game, as it’s challenging to explain the game dynamics merely by textual description. If you are curious about how the final agent performs against real players online, feel free to take a look at the video below. For more description about the game: Link to Official Website Hence the name, Treasure Drop! The greater the point values, the better. The goal of the game is to make drops which will cause coins falling out of the bottom of the board, into the point slots. EnvironmentĮach player takes turns dropping one coin at a time into the top of the board. Let’s start with defining the Environment. Next, heuristic metrics for convergence are defined and our model is evaluated by both metrics, and by (the fun part) playing against it. Following that, the learning algorithm is briefly touched upon and reasons given why it’s suitable for this problem. Our problem domain, the game of Treasure Drop is explained and broken into pieces in terms of five major elements of any reinforcement learning application: Environment, Agent, State, Action and Reward. GitHub repository for Part 1/3: td-deepreinforcementlearning-part1 Taking the input from the screen, clicking on appopriate spots, agent was able to beat majority of the players. Framework for taking input from computer screen and performing actions are explained. Agents are trained for different rounds of the game (1, 2, 3 and 4) and put together to perform online. Trained models are challenged against each other to understand which learning process yields better agents. In order to track the learning progress, heuristic metrics are explained and plotted along with the MSE loss over iterations of episodes. Tracking learning progress and plotting is key to deep learning research. Those are namely, Prioritized Experience Replay, Target Q-Network, Cyclical Learning rate with Learning Rate Finder. Several methods are borrowed from research papers and implemented to mitigate the problem. However, learning process got unstable due to the nature of the project. In Part 2,Īfter getting the validation and early success needed with tabular approach, DQN is implemented for the same problem as Part 1 using Keras library. This intuition was very useful later on when solving the problem with Deep Q-learning Networks (DQN). It helped building intuition about the problem. Along with the smaller problem, a relatively simpler technique, geared towards smaller state spaces: Tabular Q-learning is used. This reduced the state space significantly, making problem more approachable. Started out attacking a smaller and easier version of the original problem, by reducing size of the game table. Hopefully, readers interested in Deep Reinforcement Learning find all of this insightful. I discussed unique and notable aspects of the problem, what techniques I used and why. This project made just a perfect side hustle to work on during this time. Also, I used to play this game back then when I was younger. I had a lot of free time to get busy amidst the coronavirus pandemic. The idea of applying reinforcement learning to a novel, not-yet-explored problem have always intrigued me.
0 Comments
Leave a Reply. |