Proximal Policy Optimization

GitHub Logo

Project Overview

Here, I implement a popular deep reinforcement learning algorithm known as Proximal Policy Optimization (PPO). This implementation really gets into the details, you should check it out. I demonstrate how PPO allows agents to learn simple tasks like balancing a pole on a cart and complicated games like Breakout.

Cart Pole Environment

Let's start by teaching the agent to balance this pole.

Breakout Environment

Now, let's teach it a classic Atari game: Breakout. Here, we use a CNN, so the only inputs are the pixles you can see below!

Here is our starting place after just a few steps.

And here is where we get after more training!