Meta·Llama 4 family

Llama 4 Maverick

MoEotherReleased Apr 2025

Meta's quality-focused MoE model with 17B active parameters (400B total, 128 experts). Targets quality-critical tasks with benchmark scores competitive with GPT-4o and Gemini 2.5 Pro.

Capabilities

chatcompletionfunction-callingvisioncode-generationreasoning

Modalities

textimagecode

Context Window

1.0M tokens

Providers

available

Available from 8 providers

Cheapest

Anyscale

$0.50/1M tokens

Amazon Bedrock, Anyscale, Azure AI, Deep Infra, Fireworks, Hugging Face Inference, Replicate, Together AI

Providers (8)

Sorted by total cost (input + output per 1M tokens). Click a row to view provider details.

Provider	Pricing (per 1M)	Rate Limits	Regions	Health	Latency
Anyscale	In: $0.25Out: $0.25	600 RPM / 1.0M TPM	us-east-1us-west-2	Healthy	0ms
Together AI	In: $0.27Out: $0.27	600 RPM / 1.0M TPM	us-east-1us-west-2	Healthy	0ms
Hugging Face Inference	In: $0.30Out: $0.30	300 RPM / 500K TPM	us-east-1eu-west-1	Healthy	0ms
Azure AI	In: $0.37Out: $0.37	200 RPM / 600K TPM	us-east-1us-west-2eu-west-1	Healthy	0ms
Deep Infra	In: $0.20Out: $0.60	600 RPM / 1.0M TPM	us-east-1eu-west-1	Healthy	0ms
Fireworks	In: $0.22Out: $0.88	600 RPM / 1.0M TPM	us-east-1us-west-2	Healthy	0ms
Replicate	In: $0.30Out: $0.95	300 RPM / 500K TPM	us-east-1us-west-2	Healthy	0ms
Amazon Bedrock	In: $0.34Out: $0.99	100 RPM / 400K TPM	us-east-1us-west-2	Healthy	0ms

Quick Start

Use this model via Anyscale with an OpenAI-compatible SDK.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.anyscale.com/v1",
  apiKey: process.env.ANYSCALE_API_KEY,
});

const response = await client.chat.completions.create({
  model: "meta-llama/Llama-4-Maverick-17B-128E-Instruct",
  messages: [
    { role: "user", content: "Hello!" }
  ],
});

console.log(response.choices[0].message.content);

Using Anyscale API • OpenAI-compatible SDK