NVIDIA's flagship open 550B-parameter Mixture-of-Experts model with 55B active parameters, built for frontier reasoning and orchestration in long-running agentic systems. Features hybrid Mamba-Transformer architecture, LatentMoE routing, multi-token prediction, and NVFP4 precision for 5x higher throughput. Achieves 30% lower cost-to-task-completion on agentic benchmarks. Supports 1M+ token context window with 95% accuracy on Ruler@1M.
1.0M tokens
4
available
Cheapest
NVIDIA NIM
$0.00/1M tokens
Deep Infra, NVIDIA NIM, OpenRouter, Together AI
Sorted by total cost (input + output per 1M tokens). Click a row to view provider details.
| Provider | Pricing (per 1M) | Rate Limits | Regions | Health | Latency |
|---|---|---|---|---|---|
In: FreeOut: Free | 100 RPM / 500K TPM | us-east-1us-west-2global | Healthy | 0ms | |
In: $0.40Out: $1.60 | 200 RPM / 500K TPM | us-east-1eu-west-1 | Healthy | 0ms | |
In: $0.40Out: $1.60 | 200 RPM / 1.0M TPM | us-east-1 | Healthy | 0ms | |
In: $0.50Out: $1.50 | 600 RPM / 1.0M TPM | us-east-1us-west-2 | Healthy | 0ms |
Use this model via NVIDIA NIM with an OpenAI-compatible SDK.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.nvidia.com/v1",
apiKey: process.env.NVIDIA_API_KEY,
});
const response = await client.chat.completions.create({
model: "nvidia/nemotron-3-ultra-550b-a55b",
messages: [
{ role: "user", content: "Hello!" }
],
});
console.log(response.choices[0].message.content);Using NVIDIA NIM API • OpenAI-compatible SDK