Physical Intelligence at CoRL 2024
Our last post about Physical Intelligence was timely; last Monday we received news that Physical Intelligence or better known as π received $400 Million in new funding at a valuation of $2.4 Billion, with Jeff Bezos and OpenAI leading the funding round. A $2.4 Billion valuation is a meteoric leap for a 8-month old company, and just shows the commercial interest in robotic foundation models. No-one is willing to be left behind the competition to dominate this space. Both Jeff Bezos and OpenAI already have heavy investments in Figure Robotics.
π also had a very strong presence in the Conference on Robot Learning (CoRL 2024) in Munich, Germany last week. CoRL is a great venue for the latest research in how robots can learn. You can find some great videos from CoRL at X.
At CoRL, Chelsea Finn from π had a talk on how the π 0 (first robotic foundation model from π ) was post trained with robotic data, while Sergey Levine had a overview talk on π 0. Sergeyβs talk is actually available online and is a great watch (The link has the complete video for the CoRL 2024 Workshop on Mastering Robot Manipulation in a World of Abundant Data, and Sergeyβs is the first talk in the session). If you have keen interest in how robotic foundation models will shape our future, I can vouch that you will learn a lot from this talk,
The gist of the talk was that, π is trying to replicate the success of language foundation models in the field of robotics. If we look at Large Language Models (LLMs): OpenAI, Meta, and others have had great success in training these models to Internet wide scale of data which has been carefully curated to form a dataset. Once trained, these foundation models become experts in answering domain specific questions in either zero-shot (i.e. the model used correctly) or when prompted well. π aims to achieve the same success in robotics. A Vision Language Action (VLA) model trained to general purpose data from the Internet (along with data from a few sets of robots and open-source robotic data archives) can be made to perform great for a special-purpose robot by fine-tuning the model to approximately 20 hours of data collected from the special purpose robot. In addition to the cool demos from the Physical Intelligence data and blog, Sergeyβs talk also had some not-seen videos and behind the scenes stuff.
An interesting illustration was how the robots recover when they fail. Another interesting illustration was what happens when a robot is made to fail, such as when a piece of clothing is thrown at the robot while it is folding laundry). There is also discussion on some interesting things these robots were made to do such as removing block from a Jenga pile.
The outstanding paper award at CoRL 2024 went to researchers from the Allen Institute for AI for work that makes a robot a great manipulator in a domestic setting. You can check out the videos at PoliFormer, and see how well the robot does in navigating obstacles and finding items such as an apple on a kitchen table. The work is impressive.
We will be discussing more about Physical Intelligence and PoliFormer in our next post. If any of you attended CoRL2024, I would be delighted to speak to you. Meanwhile, we also have our Thanksgiving deals in case you want to become a member and support our work. This Thanksgiving, we are offering 40% OFF our annual membership for the entire duration of your membership. (The discount accrues each year you renew your membership with us). Please use the following link to avail this offer. Thanks for your patronage.