Learn With A Robot

Learn With A Robot

Share this post

Learn With A Robot
Learn With A Robot
Gemini Robotics
Copy link
Facebook
Email
Notes
More

Gemini Robotics

Google extends Gemini 2.0 to the robotics world with its VLA model called Gemini Robotics

Amitabha Banerjee's avatar
Amitabha Banerjee
Mar 16, 2025
∙ Paid
1

Share this post

Learn With A Robot
Learn With A Robot
Gemini Robotics
Copy link
Facebook
Email
Notes
More
Share

This week, Google Deepmind previewed its Vision Language Action (VLA) model called Gemini Robotics. The demos and results on the Gemini Robotics website are astounding. Check them out.

Vision Language Action (VLA) models

But, before we get to Gemini Robotics, let’s first revisit what Vision Language Action (VLA) models are. Typically, VLA models pick up a multi-modal machine learning model and train it to output a set of actions, given multimodal inputs from a robot. We have seen a lot for progress in VLA models in the last year… from 𝝅0 released by Physical Intelligence and OpenVLA built by researchers at Stanford. The high profile humanoid robotics statup, Figure, also seems to have made major advances in its VLA model called Helix, which led it to scrap its contract with OpenAI and deploy Helix in Figure robots. Google took a similar approach in building its VLA by taking its best multimodal model Gemini 2.0, and training it for the physical world to produce the Gemini Robotics mode…

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Learn With A Robot
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More