Generative AI is the next frontier in robotics, and we see a lot of progress in application of Generative AI to robotics this year. I am writing a series of posts to discuss major technical breakthroughs in how Generative AI is set to transform and bootstrap exponential growth in use cases for robots. This article is the second in the series of Generative AI and robotics.
In our previous post, we explored how Generative AI has seen great progress in the last two years, be it in the form of Large Language Models (LLMs) such as ChatGPT (which can generate answers to our queries or models that perform text), to image generation such as DALL-E and Stable Diffusion. We had discussed one way to use Generative AI in robots called GenAug. GenAug was presented at the Robotics Science and Systems conference (RSS 2023) this week in Daegu, Korea. In GenAug, the authors augmented their training dataset with generative images that changed some attributes such as the shape of the object being picked, thus creating a richer dataset. A richer dataset leads to a more robust trained robot.
Another technology for Generative AI on robots that was presented in RSS 2023 is called ROSIE - RObot Learning with Semantically Imagined Experience. ROSIE comes from Robotics at Google Research and is an interesting application of text to image generation to train robots. ROSIE attempts to use text guided diffusion models for data augmentation for robot learning. Using these augmentations generated in a automated pipe-lined manner, ROSIE can produce highly convincing images for the entire sequence of frames of a ROBOT action, which can be then used to train a robot that is capable of learning other tasks.
Details of ROSIE
Let us explore in detail on what this means with the help of the following illustration I have borrowed from the paper.
We start with the assumption that we have a sequence of images of the action of a robot placing an object in an empty open drawer. Having such detailed labelled images has traditionally been the main way to train a robot to perform repetitive tasks. Like GenAug, ROSIE too attempts on scale up the number of images in the dataset using Generative AI. However ROSIE’s contribution is a bit different in the sense that it tries to build a more automated and scalable pipeline to generate images.
ROSIE relies on human prompts to decide how to alter a given image sequence. The figure above shows two prompts: (1) “Add a can of coke into the drawer”, and (2) “Add a toy block in the drawer, the block has different colors”. With this prompt, ROSIE generates a mask of the region of interest that is relevant to the prompt. These masks can be seen in red rectangular boxes in the middle picture. Next, given the prompt text, ROSIE performs in-painting on the selected mask to insert objects that follow the text instruction. As an example, for the first prompt: “Add a can of coke into the drawer”, the following image sequence is created.
The first image on the left is the original. The latter three are generated images based on the input prompt.
The generated images can now augment the original dataset and used to train a model.
A note to our paying subscribers. Thank you for your generosity. As Andra Keay writes in her newsletter on Robots and Startups, you are angel investors in my entrepreneurial journey, and my goal is to make you feel thrilled with what you get from this forum. Your contribution helps pay for many efforts to gather knowledge and present it in the most consumable way. If you feel that you learn a lot, please keep referring us to your friends.