Gemini Robotics
Google extends Gemini 2.0 to the robotics world with its VLA model called Gemini Robotics
This week, Google Deepmind previewed its Vision Language Action (VLA) model called Gemini Robotics. The demos and results on the Gemini Robotics website are astounding. Check them out.
Vision Language Action (VLA) models
But, before we get to Gemini Robotics, let’s first revisit what Vision Language Action (VLA) models are. Typically, VLA models pick up a multi-modal machine learning model and train it to output a set of actions, given multimodal inputs from a robot. We have seen a lot for progress in VLA models in the last year… from 𝝅0 released by Physical Intelligence and OpenVLA built by researchers at Stanford. The high profile humanoid robotics statup, Figure, also seems to have made major advances in its VLA model called Helix, which led it to scrap its contract with OpenAI and deploy Helix in Figure robots. Google took a similar approach in building its VLA by taking its best multimodal model Gemini 2.0, and training it for the physical world to produce the Gemini Robotics mode…