Integrate Vector with Meta's Llama multimodal models

Llama 3.2 90B Vision is pretty cool

Nov 18, 2024

∙ Paid

Introduction

In a previous post, we wrote how Vector could be integrated with GPT-4o via Wirepod. GPT-4o is a multimodal model from OpenAI which can reason across audio, vision, and text in real time (multimodal means having many modes: e.g. supporting text, images, and speech processing in this case). With the help of GPT-4o, Vector can understand the details in a picture taken by it. This feature can potentially help Vector understand its surroundings. We had posted a cool video, which demonstrates how Vector can describe its surroundings with help from GPT-4o.

While GPT-4o is great, it is also a pricey model. OpenAI currently charges about $10 per 1 Million output tokens for GPT-4o. There exists a smaller version of GPT-4o called GPT-4o mini, for which OpenAI charges about 60 cents per 1 Million output tokens. We have not tried GPT-4o mini so far, but we intend to try it in the near future. But these options from OpenAI mean that you must shell out a few bucks on your credit card if …

Learn With A Robot

Integrate Vector with Meta's Llama multimodal models

Llama 3.2 90B Vision is pretty cool

Introduction

This post is for paid subscribers