Teach an AI to play QWOP

2023-06-19 | By ShawnHymel

License: Attribution

Reinforcement learning is a form of machine learning based on improving interactions with an environment through gaining rewards. We will use reinforcement learning to play the video game qwop. If you’d like to see the agent in action, check out the following video:

The code for this project can be found here: https://github.com/ShawnHymel/qwop-ai

What is reinforcement learning?

Reinforcement learning, like other forms of machine learning, are mathematical models that attempt to improve at some task through experience. Where supervised learning requires labeled data, reinforcement learning updates models through rewards gained by interacting with an environment.

Training reinforcement learning AI

The model is considered the agent. The entire purpose of the agent is to take in observations and make a decision about what to do next. Observations can be any interpretation of the environment, such as a screenshot, audio data, or sensor readings. Note that the environment is anything we want our AI to interact with. It can be bounded, like a video game or race track, or it can be unbounded to include anything in our physical world.

We (the godlike programmer) are responsible for assigning rewards based on what happens in the environment. For example, we might assign -100 reward points (a penalty of 100 points) for crashing in qwop and assign 100 reward points for crossing the finish line. The training algorithm looks at the rewards obtained along with the actions it took to get there, and updates the model accordingly to take better actions next time. Ideally, the agent should choose actions that have a higher probability of obtaining a higher reward, even if those rewards happen in the future.

In addition to defining the observation space and rewards, we are also responsible for determining how the agent interacts with its environment. Note that the agent does not directly interact with the environment; it is only responsible for choosing an action to perform. In our case, we take the decision from the agent (e.g. do action ‘3’) and perform that action (e.g. if action 3, then press the ‘p’ key).

The agent is analogous to a brain--it doesn’t have any direct connections to the outside world, but it receives inputs from our senses along with rewards (for the brain, this is often in the form of hormones like dopamine and serotonin). Similarly, it does not have any direct means of interacting with the environment; it relies on muscles to carry out such actions. The brain forms new (neurological) connections to associate actions with current and future rewards, just like we do in reinforcement learning.

Install software

If you would like to try training your own model to play qwop, like we did in the video, then you will need to install some software dependencies.Fnstall the Anaconda environment (%%%LINK). Open an Anaconda terminal and run the following commands

Copy Code

conda create --name pytorch-gpu
conda activate pytorch-gpu
conda install -c conda-forge opencv=4.7.0 tesserocr=2.5.2 jupyterlab

Next, you will need to install PyTorch for your system. Follow the instructions here. I highly recommend using the GPU version of PyTorch (if you have a GPU in your system), as it will make training much faster. For the GPU version of PyTorch, you will need to install CUDA. Look at the required CUDA version when you download PyTorch, and install that version of CUDA from here.

In your Anaconda prompt, run the following:

Copy Code

python -m pip install mss pynput gymnasium==0.28.1 stable-baselines3[extra]==2.0.0a1 wandb

Finally, create an account on Weights & Biases. You will want that to perform remote logging of your AI agent training.

Start training

Clone the following repository somewhere on your system. Specifically, you will want the v07 version, as the hyperparameters there seemed to work the best: https://github.com/ShawnHymel/qwop-ai

From an Anaconda prompt, activate your virtual environment (if it is not alread activate) and start a Jupyter Lab instance in the repository:

Copy Code

cd qwop-ai/
conda activate pytorch-gpu
jupyter-lab

Execute the cells in qwop-ai.ipynb. It should walk you through the process of capturing screens and scores from qwop and interacting with the game. Note that you will want to have your qwop window positioned in the upper-left corner of the screen and sized so that the scroll bars just barely disappear.

Training a reinforcement learning agent on qwop

Note that you will be asked to log in to your Weights & Biases account with the wandb.login() function. Once training starts, you can head to wandb.ai to view training metrics.

We train this agent using Proximal Policy Optimization (PPO), which you can read about here: %%%LINK

Training can take over 6 hours (assuming you have a decently powerful GPU on your computer), so be patient. Once done, check the run metrics in Weights & Biases. Ideally, you want to see the average reward increase over time.

Watching qwop ai agent learn over time in WandB

You can save the final model, but in reinforcement learning, you often want to use the model that had the highest average reward. Note that models are saved every 25k steps, so you can always go back to some previous checkpoint.

Test agent

The final few cells load the model (you want to load qwop_model_v07.zip to use my model or some previous checkpoint from yoru training) followed by testing the model. For each step, the model is fed an observation (a stack of 4 grayscale, resized frames from the game), the model makes a prediction based on that observation (e.g. perform action “press wo”), and then we perform that action (actually pressing the ‘w’ and ‘o’ keys).

Testing the rl agent on qwop

My model simply scoots along the ground in a safe manner. If you watch the video, you can see that it struggles to get past the hurdle at 50 meters. As a challenge, see if you can get the agent to make the character actually take strides! While this ML engineer beat the human record at qwop, he needed to train the model with previously recorded play, which is known as imitation learning.

Going furtner

If you would like to learn more about reinforcement learning, I highly recommend checking out the following resources:

I specifically want to thank Nicholas Renotte for his series of videos that helped me understand how to work with gym and Stable Baselines! You can find them on his YouTube channel.

Have questions or comments? Continue the conversation on TechForum, DigiKey's online community and technical resource.

Teach an AI to play QWOP

2023-06-19 | By ShawnHymel

信息

帮助

联系我们

关注我们