Models

Browse 63 canonical LLM models across all providers

Sort by

Compare Models

Showing 1–24 of 63 models

Claude Sonnet 5

USA1.0M ctx

Anthropic's most capable Sonnet-class model, bringing frontier coding, agentic, and professional-work performance to the midsize tier while closing the gap with Opus 4.8 at a lower price. Supports adaptive thinking with selectable reasoning effort levels, a 1M-token context window, and text, image, and file inputs. Codenamed Fennec.

$2.00 – $10.00 / 1M tokens

Command A+

USA256K ctx

Cohere's enterprise flagship model building on Command A with stronger reasoning, agentic tool use, and multilingual performance across 23 languages. Optimized for secure, high-throughput RAG, retrieval, and long-horizon agent workflows in regulated environments, with private and on-premise deployment options.

$2.50 – $10.00 / 1M tokens

DiffusionGemma

USA2 providers262K ctx

Google DeepMind's experimental diffusion-based member of the Gemma 4 open model family. Unlike autoregressive models that generate text one token at a time, DiffusionGemma denoises a canvas of placeholder tokens to produce up to 256 tokens in parallel, finalizing output in one block. A Mixture-of-Experts model with 26B total parameters and 3.8B active per inference, delivering roughly 4x the throughput of similarly sized autoregressive Gemma models on local hardware. Excels at non-linear tasks like in-line editing, molecular sequencing, mathematical graphing, and self-correcting puzzles.

$0.00 – $0.15 / 1M tokens

Claude Mythos 5

USA300K ctx

Anthropic's frontier Mythos-class model — the same underlying model as Claude Fable 5 but with safeguards lifted in some areas. It has the strongest cybersecurity capabilities of any model in the world, alongside state-of-the-art performance in software engineering, knowledge work, vision, and scientific research. Access is restricted to a small group of trusted cyberdefenders and infrastructure providers through Project Glasswing.

Claude Fable 5

USA300K ctx

Anthropic's first publicly available Mythos-class model, exceeding the capabilities of any model the company has previously made generally available. State-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, vision, and scientific research. Its lead grows on longer and more complex tasks. Ships with built-in safeguards that route sensitive cybersecurity, biology, chemistry, and distillation queries to Claude Opus 4.8.

$15.00 – $75.00 / 1M tokens

Gemma 4 12B

USA3 providers262K ctx

Google's medium-size open-weight model with 12 billion parameters from the Gemma 4 family. Encoder-free unified multimodal architecture that natively processes text, image, audio, and video inputs without dedicated encoders. Features a 256K context window and supports 140+ languages. First medium-sized model capable of natively ingesting audio. Suitable for local deployment on GPUs with 16GB VRAM.

$0.00 – $0.10 / 1M tokens

Nemotron 3 Ultra

USA4 providers1.0M ctx

NVIDIA's flagship open 550B-parameter Mixture-of-Experts model with 55B active parameters, built for frontier reasoning and orchestration in long-running agentic systems. Features hybrid Mamba-Transformer architecture, LatentMoE routing, multi-token prediction, and NVFP4 precision for 5x higher throughput. Achieves 30% lower cost-to-task-completion on agentic benchmarks. Supports 1M+ token context window with 95% accuracy on Ruler@1M.

$0.00 – $1.60 / 1M tokens

Claude Opus 4.8

USA300K ctx

Anthropic's most advanced model, building on Opus 4.7 with improvements across benchmarks in coding, agentic skills, reasoning, and knowledge work. Features enhanced honesty, better tool use efficiency, dynamic workflows support, and improved alignment.

$15.00 – $75.00 / 1M tokens

Palmyra X5

USA1.0M ctx

Writer's most advanced adaptive reasoning model with a 1 million token context window. Processes full million-token prompts in approximately 22 seconds with multi-turn function calls in 300ms. Optimized for enterprise agentic AI workflows at 3-4x lower cost than GPT-4.1.

$5.00 – $15.00 / 1M tokens

DBRX

USA33K ctx

Databricks' open-source 132B parameter Mixture-of-Experts transformer model with 36B active parameters per input. Released under Databricks Open Model License, optimized for enterprise workloads including SQL generation and coding tasks.

$0.60 – $1.80 / 1M tokens

Snowflake Arctic

USA4K ctx

Snowflake's enterprise-focused open LLM with 480B total parameters using a fine-grained MoE architecture with only 17B active parameters per input. Apache 2.0 licensed, excels at SQL generation, coding, and enterprise intelligence tasks with breakthrough training efficiency.

$0.00 – $0.00 / 1M tokens

Gemini 3.1 Flash-Lite

USA2 providers1.0M ctx

Google's most cost-efficient Gemini model optimized for high-volume, low-latency use cases. Delivers 2.5x faster time to first token versus Gemini 2.5 Flash with full multimodal support. Ideal for agentic tasks, data extraction, translation, and classification.

$0.25 – $1.50 / 1M tokens

Gemini 3 Flash

USA2 providers1.0M ctx

Google's balanced model combining Gemini 3 Pro's reasoning capabilities with the Flash line's latency, efficiency, and cost. Features configurable thinking levels, multimodal function responses, and streaming function calling for complex agentic workflows.

$0.50 – $3.00 / 1M tokens

Granite 4.1 30B

USA524K ctx

IBM's largest dense decoder-only 30B parameter language model from the Granite 4.1 family. Trained on approximately 15T tokens with long-context extension up to 512K tokens. Supports tool calling, RAG, code generation, multilingual tasks across 12 languages. Released under Apache 2.0.

$0.60 – $1.20 / 1M tokens

Granite 4.1 8B

USA2 providers131K ctx

IBM's dense decoder-only 8B parameter language model from the Granite 4.1 family. Supports 131K-token context, tool calling, RAG, code generation with fill-in-the-middle, text summarization, classification, and extraction across 12 languages. Released under Apache 2.0.

$0.05 – $0.40 / 1M tokens

Laguna M.1

USA128K ctx

Poolside AI's flagship agentic coding model with 225B total parameters and 23B active (MoE). Trained from scratch in-house on 30T tokens across 6,144 NVIDIA Hopper GPUs. Optimized for complex multi-step software engineering tasks including codebase exploration, file editing, test running, and iterative debugging.

$0.00 – $0.00 / 1M tokens

GPT-5.5

USA1.0M ctx

OpenAI's most capable model designed for complex real-world work including coding, online research, information analysis, and document creation. Features advanced agentic capabilities with tool search and multi-step task execution.

$12.00 – $48.00 / 1M tokens

GPT-5.4 Mini

USA1.1M ctx

OpenAI's compact reasoning model optimized for coding, computer use, and subagent tasks. Approaches GPT-5.4 performance on several benchmarks while running more than 2x faster.

$0.75 – $4.50 / 1M tokens

Muse Spark

USA256K ctx

Meta Superintelligence Labs' first model, featuring advanced reasoning, multimodal understanding, and agentic capabilities. Processes voice, text, and image inputs with tool use and multi-agent orchestration. Powers Meta AI across its product ecosystem.

$5.00 – $25.00 / 1M tokens

Gemma 4 31B

USA262K ctx

Google's flagship open-weight dense model with 31 billion parameters from the Gemma 4 family. All parameters active per forward pass with top-tier performance on reasoning benchmarks including AIME 2026 and MMLU Pro. Supports vision and extended 256K context window.

Gemma 4 E2B

USA33K ctx

Google's efficient 2 billion parameter variant from the Gemma 4 family. Optimized for on-device and edge deployments with minimal resource requirements. Text-only model with a 32K context window, suitable for lightweight chat and completion tasks.

Gemma 4 31B

USA4 providers262K ctx

Google's flagship open-weight dense model with 31B parameters. All parameters active per forward pass. Ranks among top open models with strong performance on AIME 2026 (89.2%) and MMLU Pro (85.2%). Supports vision and extended context.

$0.00 – $0.50 / 1M tokens

Gemma 4 E4B

USA33K ctx

Google's efficient 4 billion parameter variant from the Gemma 4 family. Designed for resource-constrained environments while maintaining strong text generation quality. Text-only model with a 32K context window, balancing performance and efficiency.

Gemma 4 26B

USA262K ctx

Google's high-performance open-weight dense model with 26 billion parameters from the Gemma 4 family. Supports multimodal inputs including text and images with a 256K extended context window. Strong reasoning and code generation capabilities with all parameters active per forward pass.

1 2 3 Next →