Gemini Robotics

Google extends Gemini 2.0 to the robotics world with its VLA model called Gemini Robotics

Mar 16, 2025

∙ Paid

This week, Google Deepmind previewed its Vision Language Action (VLA) model called Gemini Robotics. The demos and results on the Gemini Robotics website are astounding. Check them out.

Vision Language Action (VLA) models

But, before we get to Gemini Robotics, let’s first revisit what Vision Language Action (VLA) models are. Typically, VLA models pick up a multi-modal machine learning model and train it to output a set of actions, given multimodal inputs from a robot. We have seen a lot for progress in VLA models in the last year… from 𝝅0 released by Physical Intelligence and OpenVLA built by researchers at Stanford. The high profile humanoid robotics statup, Figure, also seems to have made major advances in its VLA model called Helix, which led it to scrap its contract with OpenAI and deploy Helix in Figure robots. Google took a similar approach in building its VLA by taking its best multimodal model Gemini 2.0, and training it for the physical world to produce the Gemini Robotics mode…

Learn With A Robot

Gemini Robotics

Google extends Gemini 2.0 to the robotics world with its VLA model called Gemini Robotics

Vision Language Action (VLA) models

This post is for paid subscribers