A quick post on the Llama 4 models released by Meta AI today. The announcement came in as somewhat of a surprise… I mean, we were all expecting Llama 4 to arrive somewhat soon, but why on Earth does Meta AI release a new model midday Saturday. But Nathan Lambert at Interconnects might be just right in predicting that Meta needed to preempt something big coming up next week. Here is a screenshot of his tweet.
What is different about Llama 4?
The major change in Llama 4 compared to previous versions of Llama is that Llama 4 is a Mixture of Experts (MoE) model., which means that unlike its predecessor Llama 3.3 70B, the entire model need not be computed during inference. This is a growing trend we see in AI models (example DeepSeek R1 is also a Mixture of Experts). The main justification for this approach is to reduce the cost of computation while running inference, that is generate accurate tokens at a lower cost. As an example, Llama 4 Maverick is a 400 Billion parameter model, but it h…