Meet an Open Source Visual Language Action (VLA) model - Open VLA

An open source model for generalistic robot manipulation

Jul 13, 2024

Generalistic Robot Manipulation

In the rapidly evolving field of robotics, a significant challenge is to develop generalist models that can enable a robot to perform a diverse range of tasks. We have written before on the growing investments in developing a general purpose robot here. A general purpose robot can be far more useful to humans than today’s robots which are limited to factory settings where they are trained for tasks, but cannot operate in environments such as peoples’ homes where it is difficult to retrain and recalibrate robots. With the goal to facilitate a general purpose robot, a team of researchers led by Stanford University explored the development of OpenVLA, a powerful generalist robot manipulation policy that can perform across a wide variety of tasks. A summary of OpenVLA is captured in the following screenshot from the paper.

Open VLA

OpenVLA is a robot manipulation policy that is trained on a carefully curated open source dataset of 970,000 robotic episodes. A Visual-Language Action (VLA) model is trained to generate robotic actions represented as tokens given a combination of input images from the robot’s camera and a request in the form of text. Open VLA is trained on top of the Llama-2 7B open source model from Meta. The training dataset comprises of actions performed by different robots. This allows OpenVLA to learn a diverse set of skills, enabling it to tackle a wide range of tasks with remarkable success.

The researchers put OpenVLA through a rigorous evaluation process, testing its capabilities on the BridgeData V2 WidowX robotic framework and the Google robot evaluation suite. These benchmarks assess the robot's performance on a variety of tasks, including visual, motion, physical, and semantic generalization challenges, as well as language grounding tests.

The results are truly impressive. OpenVLA outperforms other state-of-the-art generalist policies, such as RT-2-X (Read our post on previous policies here) in the majority of the evaluated tasks.

Pushing the Boundaries of Generalization

One of the key strengths of OpenVLA has been its ability to generalize to a wide range of distribution shifts and out-of-distribution (OOD) scenarios. The researchers intentionally introduced various distribution shifts in the evaluation tasks, such as by using different object appearances, backgrounds, and initial conditions compared to the training data.

Despite these challenges, OpenVLA still managed to maintain a high level of performance, showcasing its remarkable generalization capabilities. You can check out how different robots perform with Open VLA at their GitHub page.

New Robot Setups

In addition to its impressive performance on the benchmark tasks, the researchers also explored the data-efficient adaptation of OpenVLA to new robot setups. By fine-tuning the pre-trained OpenVLA model on a small number of demonstrations (10-150) for each new task, the researchers were able to quickly adapt the policy to these new settings. This X thread discusses how to train OpenVLA for your robot… as you may realize, the toughest step is to generate a dataset for your specific robot. We intend to create a dataset for the Vector robot, fine tune OpenVLA and check out how it works in terms of lifting a cube.

Implications

The development of generalist robot manipulation policies like OpenVLA represents a significant step forward in the field of robotics. By creating a versatile system that can handle a wide range of tasks, researchers are paving the way for more adaptable and capable robotic systems that can be deployed in a variety of real-world applications. This advancement has the potential to revolutionize industries such as manufacturing, logistics, and healthcare, and domestic setups, where robots can be used to automate a diverse set of tasks.

Conclusion

The research article showcases the impressive capabilities of OpenVLA, a generalist robot manipulation policy that outperforms state-of-the-art alternatives on challenging benchmarks.

As the field of robotics continues to evolve, the development of versatile and adaptable systems like OpenVLA will be crucial in unlocking the full potential of robotic technologies. This research represents an exciting step forward, and we can't wait to see how it will shape the future of robotics and the industries it serves.

Learn With A Robot

Discussion about this post