Meta Launches Llama 4 Series: First Open-Weight Natively Multimodal Models

Meta introduces Llama 4 Scout and Maverick with mixture-of-experts architecture, 10M context window, and claims to beat GPT-4o on benchmarks.

·2 min read

Meta AI

Meta Launches Llama 4 Series: First Open-Weight Natively Multimodal Models

Meta introduces Llama 4 Scout and Maverick with mixture-of-experts architecture, 10M context window, and claims to beat GPT-4o on benchmarks.

ai.meta.com

Meta just handed every AI engineer a multimodal model that doesn't phone home to San Francisco.

The Llama 4 launch marks the first time we have open-weight multimodal models that can credibly compete with GPT-4o. This isn't just another LLM release — it's the moment multimodal AI stops being a closed-source monopoly.

The engineering implications are immediate

Llama 4 Scout fitting on a single H100 GPU changes who can afford to experiment with vision-language models. Until now, if you wanted GPT-4o-level multimodal capabilities, you paid OpenAI's API rates and accepted their content policies. Now you can run comparable models on your own hardware.

The mixture-of-experts architecture is particularly clever here. Instead of scaling up a monolithic model, Meta's using specialised expert layers that activate based on the input type. This means you get multimodal capabilities without the computational overhead of running a massive model for every query.

Meta's benchmark claims deserve scrutiny, but the broader point stands: we now have a credible open alternative to closed multimodal models. The 10M context window puts it in the same league as Gemini and Claude, whilst the open weights mean you can fine-tune it for your specific use case.

For product engineers, this opens up entirely new possibilities. Computer vision applications that were previously locked behind expensive APIs can now run locally. Document analysis, image understanding, and multimodal search become viable for smaller companies and research teams.

The timing isn't coincidental. OpenAI's o3 announcement showed impressive reasoning capabilities, but it also highlighted how expensive frontier AI is becoming. Meta's response is characteristically different — instead of building the most capable model, they're building the most accessible one.

This matters because accessibility drives adoption, and adoption drives the ecosystem. When developers can experiment freely with multimodal AI, they build things that wouldn't have been viable under the API model. We saw this with the original Llama releases spurring the open-source LLM ecosystem.

The real test isn't whether Llama 4 beats GPT-4o on benchmarks — it's whether it's good enough for production use cases whilst offering the deployment flexibility that closed models can't match. Early indicators suggest it might be.

What happens when every engineering team can run their own multimodal AI stack? We're about to find out.


Read the original on Meta AI

ai.meta.com

Stay up to date

Get notified when I publish something new, and unsubscribe at any time.

More news