Here, I implement a popular deep reinforcement learning algorithm known as Proximal Policy Optimization (PPO). This implementation really gets into the details, you should check it out. I demonstrate how PPO allows agents to learn simple tasks like balancing a pole on a cart and complicated games like Breakout.
Let's start by teaching the agent to balance this pole.
Now, let's teach it a classic Atari game: Breakout. Here, we use a CNN, so the only inputs are the pixles you can see below!
Here is our starting place after just a few steps.
And here is where we get after more training!