What is Generative AI?
If you have not heard about ChatGPT by now, you must be an alien from another planet. Some of the hottest technology trends such as ChatGPT have their roots in a technology known as Generative Artificial Intelligence or Generative AI. In the simplest terms, Generative AI usually refers to a trained Neural Network which can generate never seen before content based on some hints. In the case of ChatGPT, the hints refer to a question that the user asks. A GPT-3.5/4 model then generates tokens using these hints, and a well trained model can generate the correct tokens to correctly answer the user’s question. Stable Diffusion and DALL-E work the same way, a text hint is used to generate never seen before image content. With the scale of neural networks that can be trained now (GPT-3 had 175 Billion parameters in its neural network) with distributed GPUs (or hardware accelerators such as those from Sambanova Systems), the Generative AI models have become sophisticated enough to accomplish tasks better than humans. Hence, there is a serious risk of Generative AI replacing some basic jobs such as mine (blog writing), contract creation, generating websites, or writing code.
But what about robots?
The success of Generative AI raises an obvious question: If 2022 was the year when millions of people were awed by human quality answers generated by ChatGPT, when will people be superbly impressed by tasks performed by robots? There is certainly an immense scope to apply Generative AI in the field of robotics. In this article, we discuss Generative Augmentation for robots, which may open your eye to the possibilities around us.
GenAug (Generative Augmentation)
GenAug is probably the most promising piece of work when it comes to application of Generative AI on robots. There is a great video explaining GenAug at their website. There is also a great thread on Twitter by one of the authors of GenAug.
The idea behind GenAug is relatively simple. Training a robot to accomplish a task is usually done by imitation learning, in which a canned demonstration of the task is prepared, pictures taken, and a robot is trained to replicate the behavior in the demonstration.
The limitation is that while the trained robot usually succeeds in replicating the task in the same environment, it fails if any small changes are made to the environment; and therefore the robot is never good enough to perform the task in a real life environment. GenAug uses Generative AI to augment images that are used to train the robot. As an example, for a task that involves picking and placing an object on a table, the images can be augmented in the following ways:
Changing the shape of the object which is being picked,
Changing the shape and number of other distracting objects on the table, and
Changing the background or the dimensions of the table.
Evaluation
For every task, an augmented dataset of 1000 images is generated. The authors evaluated their work on 10 tasks for which they chose: (i) 10 unseen environments, (ii) 10 scenes with unseen objects to pick, and (iii) 10 scenes with unseen objects to place. The evaluation measured the performance of a robot arm with 6 degrees of freedom and a vacuum gripper. While the GenAug trained robot was successful in 85% cases for Category (i) (Unseen environments), the success rates for Category (ii) and (iii) were lower at 45% and 52%. When compared with the same robot arm trained without GenAug, the improvement rate stood at 40%.
What do we think?
While the table top setting and the tasks chosen are simplistic, there definitely exists a great promise if this work could be extended to demonstrate success in more real life environments. We look forward to see further research developments in this domain. The complete details of the GenAug paper is available here.