Models

Browse 34 canonical LLM models across all providers

34 models

Gemini 3 Flash

2 providers1.0M ctx

Google's balanced model combining Gemini 3 Pro's reasoning capabilities with the Flash line's latency, efficiency, and cost. Features configurable thinking levels, multimodal function responses, and streaming function calling for complex agentic workflows.

$0.50$3.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiovideocode

Gemini 3.1 Flash-Lite

2 providers1.0M ctx

Google's most cost-efficient Gemini model optimized for high-volume, low-latency use cases. Delivers 2.5x faster time to first token versus Gemini 2.5 Flash with full multimodal support. Ideal for agentic tasks, data extraction, translation, and classification.

$0.25$1.50 / 1M tokens

chatcompletionfunction-callingvision+2
textimageaudiovideocode

Granite 4.1 8B

2 providers131K ctx

IBM's dense decoder-only 8B parameter language model from the Granite 4.1 family. Supports 131K-token context, tool calling, RAG, code generation with fill-in-the-middle, text summarization, classification, and extraction across 12 languages. Released under Apache 2.0.

$0.05$0.40 / 1M tokens

chatcompletionfunction-callingcode-generation
textcode

Granite 4.1 30B

524K ctx

IBM's largest dense decoder-only 30B parameter language model from the Granite 4.1 family. Trained on approximately 15T tokens with long-context extension up to 512K tokens. Supports tool calling, RAG, code generation, multilingual tasks across 12 languages. Released under Apache 2.0.

$0.60$1.20 / 1M tokens

chatcompletionfunction-callingcode-generation+1
textcode

Laguna M.1

128K ctx

Poolside AI's flagship agentic coding model with 225B total parameters and 23B active (MoE). Trained from scratch in-house on 30T tokens across 6,144 NVIDIA Hopper GPUs. Optimized for complex multi-step software engineering tasks including codebase exploration, file editing, test running, and iterative debugging.

$0.00$0.00 / 1M tokens

chatcompletionfunction-callingcode-generation+1
textcode

GPT-5.5

1.0M ctx

OpenAI's most capable model designed for complex real-world work including coding, online research, information analysis, and document creation. Features advanced agentic capabilities with tool search and multi-step task execution.

$12.00$48.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiocode

GPT-5.4 Mini

1.1M ctx

OpenAI's compact reasoning model optimized for coding, computer use, and subagent tasks. Approaches GPT-5.4 performance on several benchmarks while running more than 2x faster.

$0.75$4.50 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Muse Spark

256K ctx

Meta Superintelligence Labs' first model, featuring advanced reasoning, multimodal understanding, and agentic capabilities. Processes voice, text, and image inputs with tool use and multi-agent orchestration. Powers Meta AI across its product ecosystem.

$5.00$25.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiocode

Nemotron 3 Super 120B

1.0M ctx

NVIDIA's open hybrid Mamba-Transformer MoE model with 120B total parameters (12B active). Features 1M token context window and excels at agentic reasoning, coding, planning, and tool calling.

$0.00$0.00 / 1M tokens

chatcompletionfunction-callingcode-generation+1
textcode

Claude Opus 4.7

300K ctx

Anthropic's latest and most advanced model with state-of-the-art reasoning, coding, and analysis capabilities. Features improved tool use, extended thinking, and enhanced safety alignment.

$15.00$75.00 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

GPT-OSS 120B

2 providers131K ctx

OpenAI's first open-weight large model with 120 billion parameters. Released under Apache 2.0 license, offering strong performance on reasoning and coding tasks while being fully self-hostable.

$1.80$6.00 / 1M tokens

chatcompletionfunction-callingcode-generation+1
textcode

GPT-OSS 20B

131K ctx

OpenAI's compact open-weight model with 20 billion parameters. Released under Apache 2.0 license, designed for efficient deployment on consumer hardware while maintaining strong coding and reasoning capabilities.

$0.50$1.50 / 1M tokens

chatcompletionfunction-callingcode-generation
textcode

Grok 4.3

1.0M ctx

xAI's latest and most intelligent model with strong agentic tool calling, minimal hallucinations, and configurable reasoning. Supports 1M token context window with competitive pricing.

$1.25$2.50 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

GPT-5.4

2 providers1.1M ctx

OpenAI's frontier reasoning model combining advances in coding, reasoning, and agentic workflows. Features 1.1M token context window and strong performance on complex multi-step problems.

$2.50$15.00 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

GPT-5.5 Pro

256K ctx

OpenAI's premium tier model with extended reasoning capabilities, higher accuracy on complex tasks, and priority access. Optimized for professional and enterprise workloads requiring maximum quality.

$30.00$120.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiocode

Gemini 3.1 Pro

2 providers2.0M ctx

Google's latest flagship multimodal model with state-of-the-art performance on reasoning, coding, and multimodal understanding. Features native tool use, grounding, and million-token context window.

$7.00$21.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiovideocode

Grok 4

256K ctx

xAI's latest model with real-time information access, strong reasoning capabilities, and competitive performance on coding and analysis tasks. Features improved tool use and multimodal understanding.

$10.00$30.00 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Grok 4.20

2.0M ctx

xAI's multi-agent capable model with 2M token context window. Available in reasoning, non-reasoning, and multi-agent variants for diverse enterprise workloads.

$1.25$2.50 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Claude Opus 4.6

2 providers300K ctx

Anthropic's most capable model in the Claude 4 family, excelling at complex analysis, extended reasoning, scientific research, and advanced code generation. Features significantly improved accuracy and reduced hallucinations.

$15.00$75.00 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Claude Sonnet 4.6

2 providers200K ctx

Anthropic's balanced model offering strong performance at lower cost and latency than Opus. Excellent for everyday coding, analysis, and content generation tasks with good reasoning capabilities.

$3.00$15.00 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Grok 4.1 Fast

2.0M ctx

xAI's fast and cost-effective model with 2M token context window. Offers both reasoning and non-reasoning modes at significantly lower pricing than flagship models.

$0.20$0.50 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Claude Haiku 4.5

2 providers200K ctx

Anthropic's fastest model with near-frontier intelligence. Optimized for high-throughput, low-latency applications requiring quick responses at minimal cost. Supports extended thinking.

$0.80$5.00 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Claude Sonnet 4.5

200K ctx

Anthropic's previous-generation balanced model with strong coding and analysis capabilities. Offers excellent price-performance ratio for production workloads requiring reliable quality.

$3.00$15.00 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Gemini 2.5 Flash

2 providers1.0M ctx

Google's cost-effective model optimized for high throughput tasks. Balances speed and intelligence with strong multimodal capabilities and 1M token context window.

$0.15$0.60 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiovideocode

Gemini 2.5 Pro

2 providers1.0M ctx

Google's high-capability reasoning model with adaptive thinking for complex agentic and multimodal challenges. Features 1M token context window and strong performance on coding and scientific tasks.

$2.50$15.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiovideocode

GPT-5

2 providers256K ctx

OpenAI's fifth-generation flagship model with significant improvements in reasoning, multimodal understanding, and code generation. Features enhanced instruction following and expanded context window.

$10.00$40.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiocode

Llama 4 Maverick

8 providers1.0M ctx

Meta's quality-focused MoE model with 17B active parameters (400B total, 128 experts). Targets quality-critical tasks with benchmark scores competitive with GPT-4o and Gemini 2.5 Pro.

$0.20$0.99 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Llama 4 Scout

5 providers10.0M ctx

Meta's efficient MoE model with 17B active parameters (109B total, 16 experts). Supports up to 10M token context — the longest of any production model. Strong performance on reasoning and multilingual tasks.

$0.06$0.60 / 1M tokens

chatcompletionfunction-callingvision+1
textimagecode

Command A

256K ctx

Cohere's flagship 111B parameter model optimized for demanding enterprises requiring fast, secure, and high-quality AI. Excels at RAG, tool use, and multilingual tasks with strong reasoning capabilities.

$2.50$10.00 / 1M tokens

chatcompletionfunction-callingcode-generation+1
textcode

Llama 3.3 70B Instruct

14 providers131K ctx

Meta's flagship open-weight model with 70 billion parameters. Strong multilingual capabilities with competitive performance on reasoning and coding benchmarks. Available for self-hosting and through various inference providers.

$0.30$1.20 / 1M tokens

chatcompletionfunction-callingcode-generation
textcode

Command R7B

128K ctx

Cohere's compact 7B parameter model optimized for RAG, tool use, and code tasks. Delivers top-tier speed and efficiency on commodity GPUs and edge devices with 128K context window.

$0.04$0.15 / 1M tokens

chatcompletionfunction-callingcode-generation
textcode

Llama 3.1 8B Instruct

2 providers131K ctx

Meta's efficient open-weight model with 8 billion parameters from the Llama 3.1 family. Optimized for instruction following with strong performance on general tasks, coding, and multilingual benchmarks. Ideal for cost-effective deployment and edge inference scenarios.

$0.20$0.30 / 1M tokens

chatcompletionfunction-callingcode-generation
textcode

Claude 3 Opus

200K ctx

Anthropic's most powerful model in the Claude 3 family, excelling at complex analysis, nuanced content generation, scientific reasoning, and code generation with extended context support.

$15.00$75.00 / 1M tokens

chatcompletionfunction-callingvision+2
textimage

GPT-4

128K ctx

OpenAI's flagship large language model with advanced reasoning, instruction following, and code generation capabilities. Supports multimodal inputs including text and images.

$10.00$30.00 / 1M tokens

chatcompletionfunction-callingvision+1
textimage