Models

Browse 52 canonical LLM models across all providers

52 models

Gemini 3 Flash

2 providers1.0M ctx

Google's balanced model combining Gemini 3 Pro's reasoning capabilities with the Flash line's latency, efficiency, and cost. Features configurable thinking levels, multimodal function responses, and streaming function calling for complex agentic workflows.

$0.50$3.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiovideocode

Gemini 3.1 Flash-Lite

2 providers1.0M ctx

Google's most cost-efficient Gemini model optimized for high-volume, low-latency use cases. Delivers 2.5x faster time to first token versus Gemini 2.5 Flash with full multimodal support. Ideal for agentic tasks, data extraction, translation, and classification.

$0.25$1.50 / 1M tokens

chatcompletionfunction-callingvision+2
textimageaudiovideocode

Granite 4.1 30B

524K ctx

IBM's largest dense decoder-only 30B parameter language model from the Granite 4.1 family. Trained on approximately 15T tokens with long-context extension up to 512K tokens. Supports tool calling, RAG, code generation, multilingual tasks across 12 languages. Released under Apache 2.0.

$0.60$1.20 / 1M tokens

chatcompletionfunction-callingcode-generation+1
textcode

Granite 4.1 8B

2 providers131K ctx

IBM's dense decoder-only 8B parameter language model from the Granite 4.1 family. Supports 131K-token context, tool calling, RAG, code generation with fill-in-the-middle, text summarization, classification, and extraction across 12 languages. Released under Apache 2.0.

$0.05$0.40 / 1M tokens

chatcompletionfunction-callingcode-generation
textcode

Laguna M.1

128K ctx

Poolside AI's flagship agentic coding model with 225B total parameters and 23B active (MoE). Trained from scratch in-house on 30T tokens across 6,144 NVIDIA Hopper GPUs. Optimized for complex multi-step software engineering tasks including codebase exploration, file editing, test running, and iterative debugging.

$0.00$0.00 / 1M tokens

chatcompletionfunction-callingcode-generation+1
textcode

GPT-5.5

1.0M ctx

OpenAI's most capable model designed for complex real-world work including coding, online research, information analysis, and document creation. Features advanced agentic capabilities with tool search and multi-step task execution.

$12.00$48.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiocode

GPT-5.4 Mini

1.1M ctx

OpenAI's compact reasoning model optimized for coding, computer use, and subagent tasks. Approaches GPT-5.4 performance on several benchmarks while running more than 2x faster.

$0.75$4.50 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Muse Spark

256K ctx

Meta Superintelligence Labs' first model, featuring advanced reasoning, multimodal understanding, and agentic capabilities. Processes voice, text, and image inputs with tool use and multi-agent orchestration. Powers Meta AI across its product ecosystem.

$5.00$25.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiocode

Gemma 4 E4B

33K ctx

Google's efficient 4 billion parameter variant from the Gemma 4 family. Designed for resource-constrained environments while maintaining strong text generation quality. Text-only model with a 32K context window, balancing performance and efficiency.

chatcompletioncode-generation
textcode

Gemma 4 31B

262K ctx

Google's flagship open-weight dense model with 31 billion parameters from the Gemma 4 family. All parameters active per forward pass with top-tier performance on reasoning benchmarks including AIME 2026 and MMLU Pro. Supports vision and extended 256K context window.

chatcompletionvisioncode-generation+1
textimagecode

Gemma 4 26B

262K ctx

Google's high-performance open-weight dense model with 26 billion parameters from the Gemma 4 family. Supports multimodal inputs including text and images with a 256K extended context window. Strong reasoning and code generation capabilities with all parameters active per forward pass.

chatcompletionvisioncode-generation+1
textimagecode

Gemma 4 31B

4 providers262K ctx

Google's flagship open-weight dense model with 31B parameters. All parameters active per forward pass. Ranks among top open models with strong performance on AIME 2026 (89.2%) and MMLU Pro (85.2%). Supports vision and extended context.

$0.00$0.50 / 1M tokens

chatcompletionvisioncode-generation+1
textimagecode

Gemma 4 E2B

33K ctx

Google's efficient 2 billion parameter variant from the Gemma 4 family. Optimized for on-device and edge deployments with minimal resource requirements. Text-only model with a 32K context window, suitable for lightweight chat and completion tasks.

chatcompletion
text

Nemotron 3 Super 120B

1.0M ctx

NVIDIA's open hybrid Mamba-Transformer MoE model with 120B total parameters (12B active). Features 1M token context window and excels at agentic reasoning, coding, planning, and tool calling.

$0.00$0.00 / 1M tokens

chatcompletionfunction-callingcode-generation+1
textcode

Claude Opus 4.7

300K ctx

Anthropic's latest and most advanced model with state-of-the-art reasoning, coding, and analysis capabilities. Features improved tool use, extended thinking, and enhanced safety alignment.

$15.00$75.00 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

GPT-OSS 20B

131K ctx

OpenAI's compact open-weight model with 20 billion parameters. Released under Apache 2.0 license, designed for efficient deployment on consumer hardware while maintaining strong coding and reasoning capabilities.

$0.50$1.50 / 1M tokens

chatcompletionfunction-callingcode-generation
textcode

GPT-OSS 120B

2 providers131K ctx

OpenAI's first open-weight large model with 120 billion parameters. Released under Apache 2.0 license, offering strong performance on reasoning and coding tasks while being fully self-hostable.

$1.80$6.00 / 1M tokens

chatcompletionfunction-callingcode-generation+1
textcode

Grok 4.3

1.0M ctx

xAI's latest and most intelligent model with strong agentic tool calling, minimal hallucinations, and configurable reasoning. Supports 1M token context window with competitive pricing.

$1.25$2.50 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

GPT-5.4

2 providers1.1M ctx

OpenAI's frontier reasoning model combining advances in coding, reasoning, and agentic workflows. Features 1.1M token context window and strong performance on complex multi-step problems.

$2.50$15.00 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Gemini 3.1 Pro

2 providers2.0M ctx

Google's latest flagship multimodal model with state-of-the-art performance on reasoning, coding, and multimodal understanding. Features native tool use, grounding, and million-token context window.

$7.00$21.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiovideocode

GPT-5.5 Pro

256K ctx

OpenAI's premium tier model with extended reasoning capabilities, higher accuracy on complex tasks, and priority access. Optimized for professional and enterprise workloads requiring maximum quality.

$30.00$120.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiocode

Tiny Aya

8K ctx

Compact multilingual language model from Cohere For AI with 3.35B parameters, optimized for efficient and balanced multilingual representation across 70+ languages including many lower-resourced ones. Designed for edge deployment without cloud dependency. Trained on 64 NVIDIA H100 GPUs with specialized regional variants available (Global, Earth, Fire).

chatcompletion
text

Grok 4.20

2.0M ctx

xAI's multi-agent capable model with 2M token context window. Available in reasoning, non-reasoning, and multi-agent variants for diverse enterprise workloads.

$1.25$2.50 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Grok 4

256K ctx

xAI's latest model with real-time information access, strong reasoning capabilities, and competitive performance on coding and analysis tasks. Features improved tool use and multimodal understanding.

$10.00$30.00 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Claude Opus 4.6

2 providers300K ctx

Anthropic's most capable model in the Claude 4 family, excelling at complex analysis, extended reasoning, scientific research, and advanced code generation. Features significantly improved accuracy and reduced hallucinations.

$15.00$75.00 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Claude Sonnet 4.6

2 providers200K ctx

Anthropic's balanced model offering strong performance at lower cost and latency than Opus. Excellent for everyday coding, analysis, and content generation tasks with good reasoning capabilities.

$3.00$15.00 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Grok 4.1 Fast

2.0M ctx

xAI's fast and cost-effective model with 2M token context window. Offers both reasoning and non-reasoning modes at significantly lower pricing than flagship models.

$0.20$0.50 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Claude Sonnet 4.5

200K ctx

Anthropic's previous-generation balanced model with strong coding and analysis capabilities. Offers excellent price-performance ratio for production workloads requiring reliable quality.

$3.00$15.00 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Claude Haiku 4.5

2 providers200K ctx

Anthropic's fastest model with near-frontier intelligence. Optimized for high-throughput, low-latency applications requiring quick responses at minimal cost. Supports extended thinking.

$0.80$5.00 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Gemini 2.5 Flash

2 providers1.0M ctx

Google's cost-effective model optimized for high throughput tasks. Balances speed and intelligence with strong multimodal capabilities and 1M token context window.

$0.15$0.60 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiovideocode

Gemini 2.5 Pro

2 providers1.0M ctx

Google's high-capability reasoning model with adaptive thinking for complex agentic and multimodal challenges. Features 1M token context window and strong performance on coding and scientific tasks.

$2.50$15.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiovideocode

GPT-5

2 providers256K ctx

OpenAI's fifth-generation flagship model with significant improvements in reasoning, multimodal understanding, and code generation. Features enhanced instruction following and expanded context window.

$10.00$40.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiocode

Nemotron Nano 9B v2

131K ctx

NVIDIA's compact 9B parameter model trained from scratch for both reasoning and non-reasoning tasks. Generates reasoning traces before final responses. Efficient for edge and on-device deployment.

$0.00$0.00 / 1M tokens

chatcompletioncode-generationreasoning
textcode

Llama 4 Scout

5 providers10.0M ctx

Meta's efficient MoE model with 17B active parameters (109B total, 16 experts). Supports up to 10M token context — the longest of any production model. Strong performance on reasoning and multilingual tasks.

$0.06$0.60 / 1M tokens

chatcompletionfunction-callingvision+1
textimagecode

Llama 4 Maverick

8 providers1.0M ctx

Meta's quality-focused MoE model with 17B active parameters (400B total, 128 experts). Targets quality-critical tasks with benchmark scores competitive with GPT-4o and Gemini 2.5 Pro.

$0.20$0.99 / 1M tokens

chatcompletionfunction-callingvision+2
textimagecode

Gemma 3 1B

33K ctx

Google's lightweight open-weight model with 1 billion parameters from the Gemma 3 family. Designed for on-device and resource-constrained deployments. Supports text-only tasks with a 32K context window. Efficient for chat and basic completion workloads.

chatcompletion
text

Gemma 3 12B

131K ctx

Google's mid-size open-weight model with 12 billion parameters from the Gemma 3 family. Supports multimodal inputs including text and images with a 128K context window. Strong performance on reasoning and code generation tasks at moderate compute cost.

chatcompletionvisioncode-generation+1
textimagecode

Gemma 3 27B

131K ctx

Google's largest open-weight model in the Gemma 3 family with 27 billion parameters. Supports multimodal inputs including text and images with a 128K context window. Delivers strong performance across reasoning, code generation, and vision tasks, competitive with larger proprietary models.

chatcompletionvisioncode-generation+1
textimagecode

Gemma 3 4B

131K ctx

Google's compact open-weight model with 4 billion parameters from the Gemma 3 family. Supports multimodal inputs including text and images with a 128K context window. Balances efficiency and capability for vision and language tasks.

chatcompletionvisioncode-generation
textimagecode

Command A

256K ctx

Cohere's flagship 111B parameter model optimized for demanding enterprises requiring fast, secure, and high-quality AI. Excels at RAG, tool use, and multilingual tasks with strong reasoning capabilities.

$2.50$10.00 / 1M tokens

chatcompletionfunction-callingcode-generation+1
textcode

Phi-4 Mini

128K ctx

Microsoft's Phi-4 Mini model with 3.8B parameters providing lightweight yet capable language understanding and code generation, optimized for resource-constrained deployments with a large 128K context window.

chatcompletioncode-generation
textcode

Llama 3.3 70B Instruct

14 providers131K ctx

Meta's flagship open-weight model with 70 billion parameters. Strong multilingual capabilities with competitive performance on reasoning and coding benchmarks. Available for self-hosting and through various inference providers.

$0.30$1.20 / 1M tokens

chatcompletionfunction-callingcode-generation
textcode

Phi-4

16K ctx

Microsoft's Phi-4 model with 14B parameters excelling at reasoning and code generation tasks, delivering strong performance relative to its compact size with efficient inference characteristics.

chatcompletionreasoningcode-generation
textcode

Command R7B

128K ctx

Cohere's compact 7B parameter model optimized for RAG, tool use, and code tasks. Delivers top-tier speed and efficiency on commodity GPUs and edge devices with 128K context window.

$0.04$0.15 / 1M tokens

chatcompletionfunction-callingcode-generation
textcode

Aya Expanse 32B

8K ctx

Highly performant 32B multilingual language model from Cohere For AI, designed to rival monolingual model performance across 23 languages. Built using innovations in multilingual data arbitrage, direct preference optimization, and model merging techniques. Outperforms previous multilingual models on both automatic and human evaluations.

chatcompletionreasoning
text

Llama 3.2 3B Instruct

131K ctx

Meta's lightweight open-weight model with 3 billion parameters from the Llama 3.2 family. Designed for on-device and edge deployment with strong text generation capabilities relative to its size. Supports instruction following and general-purpose tasks.

chatcompletioncode-generation
textcode

Llama 3.2 11B Vision Instruct

131K ctx

Meta's multimodal open-weight model with 11 billion parameters from the Llama 3.2 family. Supports both text and image inputs, enabling visual understanding tasks alongside standard text generation. Suitable for applications requiring vision capabilities at moderate scale.

chatcompletionvisioncode-generation
textimagecode

Llama 3.2 90B Vision Instruct

131K ctx

Meta's largest multimodal open-weight model with 90 billion parameters from the Llama 3.2 family. Delivers strong performance on both text and image understanding tasks with competitive results on visual reasoning benchmarks. Designed for high-quality inference requiring vision capabilities.

chatcompletionvisioncode-generation
textimagecode

Llama 3.1 8B Instruct

2 providers131K ctx

Meta's efficient open-weight model with 8 billion parameters from the Llama 3.1 family. Optimized for instruction following with strong performance on general tasks, coding, and multilingual benchmarks. Ideal for cost-effective deployment and edge inference scenarios.

$0.20$0.30 / 1M tokens

chatcompletionfunction-callingcode-generation
textcode

Claude 3 Opus

200K ctx

Anthropic's most powerful model in the Claude 3 family, excelling at complex analysis, nuanced content generation, scientific reasoning, and code generation with extended context support.

$15.00$75.00 / 1M tokens

chatcompletionfunction-callingvision+2
textimage

GPT-4

128K ctx

OpenAI's flagship large language model with advanced reasoning, instruction following, and code generation capabilities. Supports multimodal inputs including text and images.

$10.00$30.00 / 1M tokens

chatcompletionfunction-callingvision+1
textimage

Whisper

448 ctx

OpenAI's Whisper automatic speech recognition model capable of multilingual audio transcription and translation, trained on a large dataset of diverse audio for robust real-world performance.

audio
audiotext