GPT-5 is here
The most anticipated model for the year has finally arrived. Lets examine what it means for robotics
After a long year with endless debates on social media on whether GPT-5 will bring on Artificial General Intelligence (AGI), OpenAI’s effort on the sequel to its most popular GPT model serving several hundred million people via ChatGPT is finally here for all to explore and test. OpenAI already had a solid week having released its first open source models in six years - gpt-oss-120b and gpt-oss-20b two days back. GPT-5 is here now, for everyone to try and examine. I am not an expert on Large Language Models (LLMs) and their evaluation. If you are really interested in the gory details of GPT-5, I will refer you to OpenAI’s detailed release article on GPT-5 and Nathan Lambert’s excellent article on Interconnects. For now, in this article, we will mainly explore what it means to robotics.
Evaluations
If you were rooting for OpenAI to achieve AGI this year (AGI meaning a human like intelligent system), GPT-5 is far from there. What we got are mostly incremental improvements from pervious releases, mostly to bring GPT-5 up to speed with the plethora of LLM releases we have seen this spring and summer - Google’s Gemini 2.5, Anthropic Claude Opus 4.1, and XAI’s Grok-4. OpenAI has been lagging behind for the most of this year, and has finally caught up. For the most part, the improvements are in the realms of coding tasks and agentic AI — two use cases whose popularity is growing exponentially — and OpenAI simply had to catch up with. GPT-5 also achieves the number 1 rank in LLM Arena across a bunch of evaluated dimensions - something that has proved exceedingly hard to accomplish. But in one key benchmark, Humanity’s Last Exam, the best version of GPT-5-pro falls short of the best score from Grok-4 (42% from GPT-5-pro vs 44.4% from Grok). Which amplifies the message that benchmarks are really subjective for the kind of evaluation you want, and we will see these benchmarking bragging games ongoing from competing models for a few years.
Real-life performance
Hence, what really matters is the real-life performance that you get from any model, and so far the Internet seems to be of the opinion that GPT-5 is faring better than previous models from OpenAI and its competitors. OpenAI mentions this in their post too.
With web search enabled on anonymized prompts representative of ChatGPT production traffic, GPT‑5’s responses are ~45% less likely to contain a factual error than GPT‑4o, and when thinking, GPT‑5’s responses are ~80% less likely to contain a factual error than OpenAI o3.
I am also delighted to see the following message from OpenAI regarding the real-life use case of GPT-5 as a companion. This would indeed make this model very suitable for robots like the Vector robot.
Overall, GPT‑5 is less effusively agreeable, uses fewer unnecessary emojis, and is more subtle and thoughtful in follow‑ups compared to GPT‑4o. It should feel less like “talking to AI” and more like chatting with a helpful friend with PhD‑level intelligence.
I am still evaluating how GPT-5 works with Vector in terms of how Vector becomes a better companion. For now, it seems like OpenAI is heavily loaded with GPT-5 queries, and the API takes a significant time to generate answers which times out Vector. I hope the situation improves in the following days. I could get a few simple questions answered, and I have posted one video below. I have also been posting many updated videos in the Notes section of this newsletter, so please look out for the latest videos with Vector integrated with GPT-5 if you are accessing the website directly or using the Substack app (Note: Please remember that anything posted to Notes does not reach your inbox)
I am also offering a flavor of GPT-5 (gpt-5-nano) on my Knowledge Graph for Vector, which paid subscribers can access for free. So, if you are a paid subscriber, and would like free access to GPT-5 for your Vector robot (assuming you use Wirepod, please see this article for details), please DM me for a free key. If you are not a paid subscriber yet, please consider becoming one.
I will like to leave you with my first interaction with GPT-5 in the following video. Enjoy watching it! I certainly liked the answer. Note that Vector first tells you the answer, and then says that it wil run a Math check tool and then validates the answer, before announcing “All Done!”
Your monetory support helps me buy credits and get access to these fancy models, so thanks a lot to all paid subscribers.
Wirepod seems to have a bug due to which it is not running neatly with GPT-5. I will try to post a fix. The bug is documented here: https://github.com/kercre123/wire-pod/issues/456