Making robots perform more generalized tasks
We explore Robotic Transformers 2 (RT-2) from Google Deepmind
Introduction
Making robots perform general tasks is one of the hardest barriers to cross before robots can perform more common activities in our daily lives. In this forum, we have explored how Generative AI can be used to build datasets that can be then used to train robots to perform generic tasks (If you are interested, please check our overview of GenAug and ROSIE)
In this article, we discuss a different approach based on Vision Language Action Models (VLAs) to accomplish the same task. Google Deepmind recently released their work on Robotics Transformer 2 (RT-2) which received quite a bit of attention in the press. We will dive into the details of RT-2 in this article.
RT-1
Before we delve into RT-2, lets us go through the basics of RT-1, the first version of Robotics Transformer from the Google Deepmind team. RT-1 had the same goal as RT-2: to train a robot to perform general purpose tasks. To train RT-1, researchers at Google Deepmind developed a real-world robotics dataset of 130…