Meta's efficient MoE model with 17B active parameters (109B total, 16 experts). Supports up to 10M token context — the longest of any production model. Strong performance on reasoning and multilingual tasks.
10.0M tokens
5
available
Cheapest
Deep Infra
$0.24/1M tokens
Fastest
Groq
80ms TTFT
Cerebras, Deep Infra, Groq, Replicate, Together AI
Sorted by total cost (input + output per 1M tokens). Click a row to view provider details.
| Provider | Pricing (per 1M) | Rate Limits | Regions | Health | Latency |
|---|---|---|---|---|---|
In: $0.06Out: $0.18 | 600 RPM / 1.0M TPM | us-east-1eu-west-1 | Healthy | 0ms | |
In: $0.18Out: $0.18 | 600 RPM / 1.0M TPM | us-east-1us-west-2 | Healthy | 0ms | |
In: $0.11Out: $0.34 | 30 RPM / 100K TPM | us-east-1eu-west-1 | Healthy | 80ms | |
In: $0.15Out: $0.40 | 300 RPM / 500K TPM | us-east-1us-west-2 | Healthy | 0ms | |
In: $0.60Out: $0.60 | 30 RPM / 60K TPM | us-east-1 | Healthy | 0ms |
Use this model via Deep Infra with an OpenAI-compatible SDK.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.deepinfra.com/v1/openai",
apiKey: process.env.DEEPINFRA_API_KEY,
});
const response = await client.chat.completions.create({
model: "meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages: [
{ role: "user", content: "Hello!" }
],
});
console.log(response.choices[0].message.content);Using Deep Infra API • OpenAI-compatible SDK