Models

Browse 107 canonical LLM models across all providers

Sort by

Compare Models

Showing 1–24 of 107 models

Claude Sonnet 5

USA1.0M ctx

Anthropic's most capable Sonnet-class model, bringing frontier coding, agentic, and professional-work performance to the midsize tier while closing the gap with Opus 4.8 at a lower price. Supports adaptive thinking with selectable reasoning effort levels, a 1M-token context window, and text, image, and file inputs. Codenamed Fennec.

$2.00 – $10.00 / 1M tokens

Sakana Fugu

Japan256K ctx

Sakana AI's multi-agent orchestration model from Tokyo, delivered as a single OpenAI-compatible API. Fugu is itself a language model trained to call a pool of specialist LLMs (and recursive instances of itself), handling model selection, delegation, verification, and synthesis behind one endpoint. Built on Sakana AI's TRINITY and Conductor research, its routing intelligence is learned in model weights rather than hand-configured.

$2.00 – $12.00 / 1M tokens

Sakana Fugu Ultra

Japan256K ctx

The higher-quality tier of Sakana AI's Fugu multi-agent orchestration system, tuned for the hardest coding, reasoning, science, and agentic tasks. Coordinates a swappable pool of frontier LLMs through one OpenAI-compatible endpoint, delegating sub-tasks, verifying intermediate work, and synthesizing a single answer. Sakana reports strong vendor benchmarks including 93.2 on LiveCodeBench, 73.7 on SWE-Bench Pro, and 82.1 on TerminalBench.

$5.00 – $30.00 / 1M tokens

Sarvam-M

India2 providers131K ctx

Sarvam AI's 24B-parameter instruction-tuned model derived from Mistral-Small-3.1-24B, post-trained on English plus eleven major Indic languages (bn, hi, kn, gu, mr, ml, or, pa, ta, te). Delivers large relative gains on Indian-language, math, and programming benchmarks over its base model, with a hybrid reasoning mode for complex tasks.

$0.25 – $1.50 / 1M tokens

Command A+

USA256K ctx

Cohere's enterprise flagship model building on Command A with stronger reasoning, agentic tool use, and multilingual performance across 23 languages. Optimized for secure, high-throughput RAG, retrieval, and long-horizon agent workflows in regulated environments, with private and on-premise deployment options.

$2.50 – $10.00 / 1M tokens

Sarvam-105B

India128K ctx

Sarvam AI's sovereign 105B-parameter Mixture-of-Experts model activating ~9B parameters per token, with a 128K-token context window. Trained on 12 trillion tokens across 22 Indian languages using 128 sparse experts with Multi-head Latent Attention and a custom low-fertility Indic tokenizer. Wins the majority of pairwise comparisons on Indian-language and STEM benchmarks.

$1.00 – $3.00 / 1M tokens

GLM-5.2

China2 providers1.0M ctx

Z.ai's (formerly Zhipu AI) flagship open-weight coding model with a 1M-token context window. Mixture-of-Experts architecture with 753B total parameters and ~40B active per request, featuring two cost-balancing reasoning modes. Tops several coding benchmarks while remaining a fraction of the cost of comparable proprietary frontier models. MIT-licensed weights.

$0.60 – $2.00 / 1M tokens

Sarvam-30B

India128K ctx

Sarvam AI's 30B-parameter Mixture-of-Experts reasoning model trained from scratch with only 2.4B active parameters per token. Optimized for real-time deployment and Indian languages, delivering strong reasoning, coding, and conversational performance while remaining efficient to serve. Open-weights.

$0.60 – $1.80 / 1M tokens

Kimi K2.7 Code

China3 providers1.0M ctx

Moonshot AI's latest open-source, coding-focused model in the Kimi K2 family, built to complete end-to-end programming tasks reliably over long contexts. A 1-trillion-parameter model that cuts reasoning token usage by roughly 30% versus K2.6 while improving coding and agent performance — +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite for multi-language support. Released under a Modified MIT License and available via Kimi APIs and Hugging Face.

$0.60 – $3.00 / 1M tokens

DiffusionGemma

USA2 providers262K ctx

Google DeepMind's experimental diffusion-based member of the Gemma 4 open model family. Unlike autoregressive models that generate text one token at a time, DiffusionGemma denoises a canvas of placeholder tokens to produce up to 256 tokens in parallel, finalizing output in one block. A Mixture-of-Experts model with 26B total parameters and 3.8B active per inference, delivering roughly 4x the throughput of similarly sized autoregressive Gemma models on local hardware. Excels at non-linear tasks like in-line editing, molecular sequencing, mathematical graphing, and self-correcting puzzles.

$0.00 – $0.15 / 1M tokens

Claude Fable 5

USA300K ctx

Anthropic's first publicly available Mythos-class model, exceeding the capabilities of any model the company has previously made generally available. State-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, vision, and scientific research. Its lead grows on longer and more complex tasks. Ships with built-in safeguards that route sensitive cybersecurity, biology, chemistry, and distillation queries to Claude Opus 4.8.

$15.00 – $75.00 / 1M tokens

Claude Mythos 5

USA300K ctx

Anthropic's frontier Mythos-class model — the same underlying model as Claude Fable 5 but with safeguards lifted in some areas. It has the strongest cybersecurity capabilities of any model in the world, alongside state-of-the-art performance in software engineering, knowledge work, vision, and scientific research. Access is restricted to a small group of trusted cyberdefenders and infrastructure providers through Project Glasswing.

Gemma 4 12B

USA3 providers262K ctx

Google's medium-size open-weight model with 12 billion parameters from the Gemma 4 family. Encoder-free unified multimodal architecture that natively processes text, image, audio, and video inputs without dedicated encoders. Features a 256K context window and supports 140+ languages. First medium-sized model capable of natively ingesting audio. Suitable for local deployment on GPUs with 16GB VRAM.

$0.00 – $0.10 / 1M tokens

Nemotron 3 Ultra

USA4 providers1.0M ctx

NVIDIA's flagship open 550B-parameter Mixture-of-Experts model with 55B active parameters, built for frontier reasoning and orchestration in long-running agentic systems. Features hybrid Mamba-Transformer architecture, LatentMoE routing, multi-token prediction, and NVFP4 precision for 5x higher throughput. Achieves 30% lower cost-to-task-completion on agentic benchmarks. Supports 1M+ token context window with 95% accuracy on Ruler@1M.

$0.00 – $1.60 / 1M tokens

MiniMax M3

China3 providers1.0M ctx

MiniMax's frontier open-weight model with 1M-token context window, native multimodality (text, image, video), and strong coding capabilities. Built on MiniMax Sparse Attention (MSA) architecture, achieving 59% on SWE-Bench Pro with significantly improved efficiency at long context.

$0.30 – $2.40 / 1M tokens

Claude Opus 4.8

USA300K ctx

Anthropic's most advanced model, building on Opus 4.7 with improvements across benchmarks in coding, agentic skills, reasoning, and knowledge work. Features enhanced honesty, better tool use efficiency, dynamic workflows support, and improved alignment.

$15.00 – $75.00 / 1M tokens

Jamba Large 1.7

Israel2 providers262K ctx

AI21's latest hybrid SSM-Transformer model with Mixture-of-Experts architecture. Features a 256K context window, improved grounding and instruction-following. 94B total parameters with 398B active, optimized for enterprise long-context tasks.

$2.00 – $8.00 / 1M tokens

Falcon 3 10B

UAE33K ctx

TII's open-source 10B parameter model from the Falcon 3 family. Achieved number one position on Hugging Face's LLM leaderboard in its size category, outperforming Meta's Llama variants and other models under 13B parameters.

$0.10 – $0.20 / 1M tokens

Falcon-H1

UAE2 providers131K ctx

TII's hybrid Mamba-Transformer model that outperforms comparable offerings from Meta's Llama and Alibaba's Qwen in the 30-70B parameter range. Designed for real-world AI on everyday devices and resource-limited settings with state-of-the-art efficiency.

$0.50 – $2.40 / 1M tokens

Snowflake Arctic

USA4K ctx

Snowflake's enterprise-focused open LLM with 480B total parameters using a fine-grained MoE architecture with only 17B active parameters per input. Apache 2.0 licensed, excels at SQL generation, coding, and enterprise intelligence tasks with breakthrough training efficiency.

$0.00 – $0.00 / 1M tokens

Palmyra X5

USA1.0M ctx

Writer's most advanced adaptive reasoning model with a 1 million token context window. Processes full million-token prompts in approximately 22 seconds with multi-turn function calls in 300ms. Optimized for enterprise agentic AI workflows at 3-4x lower cost than GPT-4.1.

$5.00 – $15.00 / 1M tokens

StableLM 2 12B

UK4K ctx

Stability AI's 12.1 billion parameter decoder-only language model pre-trained on 2 trillion tokens of diverse multilingual and code datasets. Supports multiple languages and offers strong performance for its compact size with instruction-tuned chat variant available.

$0.10 – $0.10 / 1M tokens

DBRX

USA33K ctx

Databricks' open-source 132B parameter Mixture-of-Experts transformer model with 36B active parameters per input. Released under Databricks Open Model License, optimized for enterprise workloads including SQL generation and coding tasks.

$0.60 – $1.80 / 1M tokens

Yi-Lightning

China131K ctx

01.AI's flagship large language model with enhanced Mixture-of-Experts architecture. Ranked 6th on Chatbot Arena with particularly strong results in Chinese, Math, Coding, and Hard Prompts categories. Features advanced expert segmentation and optimized KV-caching.

$0.99 – $0.99 / 1M tokens

1 2 3 4 5 Next →