Models

Browse 43 canonical LLM models across all providers

Showing 1–24 of 43 models

Claude Sonnet 51.0M ctx

Anthropic's most capable Sonnet-class model, bringing frontier coding, agentic, and professional-work performance to the midsize tier while closing the gap with Opus 4.8 at a lower price. Supports adaptive thinking with selectable reasoning effort levels, a 1M-token context window, and text, image, and file inputs. Codenamed Fennec.

chatcompletionfunction-calling

Command A+256K ctx

Cohere's enterprise flagship model building on Command A with stronger reasoning, agentic tool use, and multilingual performance across 23 languages. Optimized for secure, high-throughput RAG, retrieval, and long-horizon agent workflows in regulated environments, with private and on-premise deployment options.

chatcompletionfunction-calling

DiffusionGemma262K ctx

Google DeepMind's experimental diffusion-based member of the Gemma 4 open model family. Unlike autoregressive models that generate text one token at a time, DiffusionGemma denoises a canvas of placeholder tokens to produce up to 256 tokens in parallel, finalizing output in one block. A Mixture-of-Experts model with 26B total parameters and 3.8B active per inference, delivering roughly 4x the throughput of similarly sized autoregressive Gemma models on local hardware. Excels at non-linear tasks like in-line editing, molecular sequencing, mathematical graphing, and self-correcting puzzles.

chatcompletioncode-generation

Claude Fable 5300K ctx

Anthropic's first publicly available Mythos-class model, exceeding the capabilities of any model the company has previously made generally available. State-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, vision, and scientific research. Its lead grows on longer and more complex tasks. Ships with built-in safeguards that route sensitive cybersecurity, biology, chemistry, and distillation queries to Claude Opus 4.8.

chatcompletionfunction-calling

Claude Mythos 5300K ctx

Anthropic's frontier Mythos-class model — the same underlying model as Claude Fable 5 but with safeguards lifted in some areas. It has the strongest cybersecurity capabilities of any model in the world, alongside state-of-the-art performance in software engineering, knowledge work, vision, and scientific research. Access is restricted to a small group of trusted cyberdefenders and infrastructure providers through Project Glasswing.

chatcompletionfunction-calling

Nemotron 3 Ultra1.0M ctx

NVIDIA's flagship open 550B-parameter Mixture-of-Experts model with 55B active parameters, built for frontier reasoning and orchestration in long-running agentic systems. Features hybrid Mamba-Transformer architecture, LatentMoE routing, multi-token prediction, and NVFP4 precision for 5x higher throughput. Achieves 30% lower cost-to-task-completion on agentic benchmarks. Supports 1M+ token context window with 95% accuracy on Ruler@1M.

chatcompletionfunction-calling

Gemma 4 12B262K ctx

Google's medium-size open-weight model with 12 billion parameters from the Gemma 4 family. Encoder-free unified multimodal architecture that natively processes text, image, audio, and video inputs without dedicated encoders. Features a 256K context window and supports 140+ languages. First medium-sized model capable of natively ingesting audio. Suitable for local deployment on GPUs with 16GB VRAM.

chatcompletionvision

Claude Opus 4.8300K ctx

Anthropic's most advanced model, building on Opus 4.7 with improvements across benchmarks in coding, agentic skills, reasoning, and knowledge work. Features enhanced honesty, better tool use efficiency, dynamic workflows support, and improved alignment.

chatcompletionfunction-calling

Palmyra X51.0M ctx

Writer's most advanced adaptive reasoning model with a 1 million token context window. Processes full million-token prompts in approximately 22 seconds with multi-turn function calls in 300ms. Optimized for enterprise agentic AI workflows at 3-4x lower cost than GPT-4.1.

chatcompletionfunction-calling

Gemini 3 Flash1.0M ctx

Google's balanced model combining Gemini 3 Pro's reasoning capabilities with the Flash line's latency, efficiency, and cost. Features configurable thinking levels, multimodal function responses, and streaming function calling for complex agentic workflows.

chatcompletionfunction-calling

Granite 4.1 30B524K ctx

IBM's largest dense decoder-only 30B parameter language model from the Granite 4.1 family. Trained on approximately 15T tokens with long-context extension up to 512K tokens. Supports tool calling, RAG, code generation, multilingual tasks across 12 languages. Released under Apache 2.0.

chatcompletionfunction-calling

Laguna M.1128K ctx

Poolside AI's flagship agentic coding model with 225B total parameters and 23B active (MoE). Trained from scratch in-house on 30T tokens across 6,144 NVIDIA Hopper GPUs. Optimized for complex multi-step software engineering tasks including codebase exploration, file editing, test running, and iterative debugging.

chatcompletionfunction-calling

GPT-5.51.0M ctx

OpenAI's most capable model designed for complex real-world work including coding, online research, information analysis, and document creation. Features advanced agentic capabilities with tool search and multi-step task execution.

chatcompletionfunction-calling

GPT-5.4 Mini1.1M ctx

OpenAI's compact reasoning model optimized for coding, computer use, and subagent tasks. Approaches GPT-5.4 performance on several benchmarks while running more than 2x faster.

chatcompletionfunction-calling

Muse Spark256K ctx

Meta Superintelligence Labs' first model, featuring advanced reasoning, multimodal understanding, and agentic capabilities. Processes voice, text, and image inputs with tool use and multi-agent orchestration. Powers Meta AI across its product ecosystem.

chatcompletionfunction-calling

Gemma 4 31B262K ctx

Google's flagship open-weight dense model with 31B parameters. All parameters active per forward pass. Ranks among top open models with strong performance on AIME 2026 (89.2%) and MMLU Pro (85.2%). Supports vision and extended context.

chatcompletionvision

Gemma 4 31B262K ctx

Google's flagship open-weight dense model with 31 billion parameters from the Gemma 4 family. All parameters active per forward pass with top-tier performance on reasoning benchmarks including AIME 2026 and MMLU Pro. Supports vision and extended 256K context window.

chatcompletionvision

Gemma 4 26B262K ctx

Google's high-performance open-weight dense model with 26 billion parameters from the Gemma 4 family. Supports multimodal inputs including text and images with a 256K extended context window. Strong reasoning and code generation capabilities with all parameters active per forward pass.

chatcompletionvision

GPT-OSS 120B131K ctx

OpenAI's first open-weight large model with 120 billion parameters. Released under Apache 2.0 license, offering strong performance on reasoning and coding tasks while being fully self-hostable.

chatcompletionfunction-calling

Claude Opus 4.7300K ctx

Anthropic's latest and most advanced model with state-of-the-art reasoning, coding, and analysis capabilities. Features improved tool use, extended thinking, and enhanced safety alignment.

chatcompletionfunction-calling

Nemotron 3 Super 120B1.0M ctx

NVIDIA's open hybrid Mamba-Transformer MoE model with 120B total parameters (12B active). Features 1M token context window and excels at agentic reasoning, coding, planning, and tool calling.

chatcompletionfunction-calling

Grok 4.31.0M ctx

xAI's latest and most intelligent model with strong agentic tool calling, minimal hallucinations, and configurable reasoning. Supports 1M token context window with competitive pricing.

chatcompletionfunction-calling

GPT-5.41.1M ctx

OpenAI's frontier reasoning model combining advances in coding, reasoning, and agentic workflows. Features 1.1M token context window and strong performance on complex multi-step problems.

chatcompletionfunction-calling

Gemini 3.1 Pro2.0M ctx

Google's latest flagship multimodal model with state-of-the-art performance on reasoning, coding, and multimodal understanding. Features native tool use, grounding, and million-token context window.

chatcompletionfunction-calling