Meta's flagship open-weight model with 70 billion parameters. Strong multilingual capabilities with competitive performance on reasoning and coding benchmarks. Available for self-hosting and through various inference providers.
131K tokens
14
available
Cheapest
Inference.net
$0.60/1M tokens
Fastest
Groq
80ms TTFT
Anyscale, Baseten, Cerebras, Fireworks, Groq, Hyperbolic, Inference.net, Nebius, NLP Cloud, Novita, Perplexity, SambaNova, SiliconFlow, Together AI
Sorted by total cost (input + output per 1M tokens). Click a row to view provider details.
| Provider | Pricing (per 1M) | Rate Limits | Regions | Health | Latency |
|---|---|---|---|---|---|
In: $0.30Out: $0.30 | 60 RPM / 200K TPM | us-east-1eu-west-1 | Healthy | 0ms | |
In: $0.35Out: $0.35 | 60 RPM / 200K TPM | us-east-1 | Healthy | 0ms | |
In: $0.35Out: $0.35 | 600 RPM / 1.0M TPM | ap-east-1global | Healthy | 0ms | |
In: $0.40Out: $0.40 | 60 RPM / 200K TPM | us-west-2 | Healthy | 0ms | |
In: $0.50Out: $0.50 | 600 RPM / 1.0M TPM | us-east-1us-west-2 | Healthy | 0ms | |
In: $0.50Out: $0.50 | 60 RPM / 300K TPM | eu-west-1 | Healthy | 0ms | |
In: $0.60Out: $0.60 | 30 RPM / 60K TPM | us-east-1 | Healthy | 0ms | |
In: $0.60Out: $0.60 | 100 RPM / 500K TPM | us-west-2 | Healthy | 0ms | |
In: $0.65Out: $0.65 | 120 RPM / 500K TPM | us-east-1us-west-2 | Healthy | 0ms | |
In: $0.59Out: $0.79 | 30 RPM / 100K TPM | us-east-1eu-west-1 | Healthy | 80ms | |
In: $0.88Out: $0.88 | 600 RPM / 1.0M TPM | us-east-1us-west-2 | Healthy | 0ms | |
In: $0.90Out: $0.90 | 600 RPM / 1.0M TPM | us-east-1us-west-2 | Healthy | 0ms | |
In: $1.00Out: $1.00 | 100 RPM / 200K TPM | us-east-1us-west-2 | Healthy | 0ms | |
In: $1.20Out: $1.20 | 60 RPM / 200K TPM | us-east-1eu-west-1 | Healthy | 0ms |
Use this model via Inference.net with an OpenAI-compatible SDK.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.inference-net.com/v1",
apiKey: process.env.INFERENCE_NET_API_KEY,
});
const response = await client.chat.completions.create({
model: "meta-llama/Llama-3.3-70B-Instruct",
messages: [
{ role: "user", content: "Hello!" }
],
});
console.log(response.choices[0].message.content);Using Inference.net API • OpenAI-compatible SDK