Meta·Llama 4 family

Llama 4 Scout

MoEotherReleased Apr 2025

Meta's efficient MoE model with 17B active parameters (109B total, 16 experts). Supports up to 10M token context — the longest of any production model. Strong performance on reasoning and multilingual tasks.

Capabilities

chatcompletionfunction-callingvisioncode-generation

Modalities

textimagecode

Context Window

10.0M tokens

Providers

available

Available from 5 providers

Cheapest

Deep Infra

$0.24/1M tokens

Fastest

Groq

165ms TTFT

Cerebras, Deep Infra, Groq, Replicate, Together AI

Providers (5)

Sorted by total cost (input + output per 1M tokens). Click a row to view provider details.

Provider	Pricing (per 1M)	Rate Limits	Regions	Health	Latency
Deep Infra	In: $0.06Out: $0.18	600 RPM / 1.0M TPM	us-east-1eu-west-1	Healthy	0ms
Together AI	In: $0.18Out: $0.18	600 RPM / 1.0M TPM	us-east-1us-west-2	Healthy	0ms
Groq	In: $0.11Out: $0.34	30 RPM / 100K TPM	us-east-1eu-west-1	Healthy	165ms
Replicate	In: $0.15Out: $0.40	300 RPM / 500K TPM	us-east-1us-west-2	Healthy	0ms
Cerebras	In: $0.60Out: $0.60	30 RPM / 60K TPM	us-east-1	Healthy	0ms

Quick Start

Use this model via Deep Infra with an OpenAI-compatible SDK.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.deepinfra.com/v1/openai",
  apiKey: process.env.DEEPINFRA_API_KEY,
});

const response = await client.chat.completions.create({
  model: "meta-llama/Llama-4-Scout-17B-16E-Instruct",
  messages: [
    { role: "user", content: "Hello!" }
  ],
});

console.log(response.choices[0].message.content);

Using Deep Infra API • OpenAI-compatible SDK