Training a Reinforcement Learning Policy and running it on Bittle Robot
Make Bittle walk without coding in the skill
Happy New Year! I spent some of my time in the holidays trying to train a Reinforcement Learning (RL) policy and apply it to make Bittle walk. I feel that I am still scratching the surface; please expect to see some posts on this topic as I learn the material.
Reinforcement Learning (RL)
RL has a lot of appeal when it comes to training skills to robots. There is an abundant body of literature for training an RL policy for quadruped robots. Papers I have read include this and this. RL is popular because it enables the user to think in terms of a reward function, which represents the action that the user is seeking from the robot, instead of thinking about controlling robot servos for each individual degrees of freedom that the robot may have.
Specifically for the Petoi Bittle robot, there is a great thread on how to train a RL policy for Bittle in the Petoi user forums. I learned all that I needed to train a RL policy for Bittle on this thread. I have also previously written a detailed post on some of the more technical details of RL, including how the stuff internally works here.
This article is a bit different and will mainly document the steps I had to take in the endeavour of training my own RL model from scratch (in case it is not easy to decipher from the original thread). So here is a list of steps.
Training a RL Model
Things you would need:
A Petoi Bittle robot preferably equipped with a Bi-Board (The original thread author mentions that he tried these steps with Nybble and NyBoard, but my experience is limited to Bittle/Bi-board). You would also need a USB cable to connect your desktop/laptop to Petoi Bi-board.
A Windows or Linux desktop or laptop, with Arduino IDE and Python 3.9+ installed. You would need some familiarity with how to use the Arduino IDE for compiling and uploading firmware.
Steps:
The easiest way to train a RL model is by using this Jupyter notebook, and using the Google Colab environment to train the model. Google Colab allows you to connect to Tesla T4 GPUs on the free tier. On the free tier, I was easily able to train the model till about 2 Million steps with the Tesla T4 GPU. Alternatively, I could create a runtime connection on Google Colab without GPUs, and then train the model till 5 Million steps. You can control the number of steps with the following variable:
MAXIMUM_LENGTH = 2e6 # Number of total steps for entire training
Once the model is trained, it will be saved at the location you specified in Google Drive in the following step:
model.save("<GDrive Path to save>")
To run the model on your Bittle robot, you need to do the following:
You need to flash a slightly different firmware on your BiBoard. The usual method of loading the firmware on Arduino, compiling it, and uploading it will work. If you are unfamiliar with how to use the Arduino IDE, Petoi has excellent documentation available here.
Once Bittle is ready with the newly flashed firmware, you need to clone this repository. Please install Python packages from requirements.txt. Now download the trained model from Step 2 to this directory (The trained model will be in a zip file). You can then edit the file enjoy.py to reflect the name of the trained model (The repository has a pre-trained model called trained_agent_PPO.zip which will get used by default). Ensure that Bittle is connected to your laptop/desktop using a USB-C cable. Then run the model using the following command:
python enjoy.py
Results
I trained the model for 1 Million and 2 Million steps and got very interesting but different results. Both my models fared poorer than the provided pre-trained model available in the original repository (trained_agent_PPO.zip). There was definitely some improvement from 1 Million steps to 2 Million steps as the robot made some forward progress without toppling. Here is an illustration of results.
1 Million steps:
2 Million steps:
Pre-trained Model available on the repository:
As you can see, there is a lot to improve. Please have patience with me as I explore this territory and post updates.
Conclusion
Have you applied RL to any of your pet projects. If so, please chime in with your thoughts/experiences.
The lower leg is installed in the wrong direction. Is it intentional?