Models

Browse 118 canonical LLM models across all providers

Showing 1–24 of 118 models

Claude Sonnet 51.0M ctx

Anthropic's most capable Sonnet-class model, bringing frontier coding, agentic, and professional-work performance to the midsize tier while closing the gap with Opus 4.8 at a lower price. Supports adaptive thinking with selectable reasoning effort levels, a 1M-token context window, and text, image, and file inputs. Codenamed Fennec.

chatcompletionfunction-calling

Sakana Fugu Ultra256K ctx

The higher-quality tier of Sakana AI's Fugu multi-agent orchestration system, tuned for the hardest coding, reasoning, science, and agentic tasks. Coordinates a swappable pool of frontier LLMs through one OpenAI-compatible endpoint, delegating sub-tasks, verifying intermediate work, and synthesizing a single answer. Sakana reports strong vendor benchmarks including 93.2 on LiveCodeBench, 73.7 on SWE-Bench Pro, and 82.1 on TerminalBench.

chatcompletionfunction-calling

Sakana Fugu256K ctx

Sakana AI's multi-agent orchestration model from Tokyo, delivered as a single OpenAI-compatible API. Fugu is itself a language model trained to call a pool of specialist LLMs (and recursive instances of itself), handling model selection, delegation, verification, and synthesis behind one endpoint. Built on Sakana AI's TRINITY and Conductor research, its routing intelligence is learned in model weights rather than hand-configured.

chatcompletionfunction-calling

GLM-5.21.0M ctx

Z.ai's (formerly Zhipu AI) flagship open-weight coding model with a 1M-token context window. Mixture-of-Experts architecture with 753B total parameters and ~40B active per request, featuring two cost-balancing reasoning modes. Tops several coding benchmarks while remaining a fraction of the cost of comparable proprietary frontier models. MIT-licensed weights.

chatcompletionfunction-calling

Command A+256K ctx

Cohere's enterprise flagship model building on Command A with stronger reasoning, agentic tool use, and multilingual performance across 23 languages. Optimized for secure, high-throughput RAG, retrieval, and long-horizon agent workflows in regulated environments, with private and on-premise deployment options.

chatcompletionfunction-calling

Sarvam-105B128K ctx

Sarvam AI's sovereign 105B-parameter Mixture-of-Experts model activating ~9B parameters per token, with a 128K-token context window. Trained on 12 trillion tokens across 22 Indian languages using 128 sparse experts with Multi-head Latent Attention and a custom low-fertility Indic tokenizer. Wins the majority of pairwise comparisons on Indian-language and STEM benchmarks.

chatcompletionfunction-calling

Sarvam-M131K ctx

Sarvam AI's 24B-parameter instruction-tuned model derived from Mistral-Small-3.1-24B, post-trained on English plus eleven major Indic languages (bn, hi, kn, gu, mr, ml, or, pa, ta, te). Delivers large relative gains on Indian-language, math, and programming benchmarks over its base model, with a hybrid reasoning mode for complex tasks.

chatcompletionfunction-calling

Sarvam AI's compact 2B-parameter language model built from the ground up for Indian languages. Provides best-in-class performance across 10 Indic languages (bn, gu, hi, kn, ml, mr, or, pa, ta, te) alongside English, outperforming larger general-purpose models like Gemma-2-2B and Llama-3.2-3B thanks to careful data curation and an efficient Indic tokenizer. Edge-deployable.

Sarvam-30B128K ctx

Sarvam AI's 30B-parameter Mixture-of-Experts reasoning model trained from scratch with only 2.4B active parameters per token. Optimized for real-time deployment and Indian languages, delivering strong reasoning, coding, and conversational performance while remaining efficient to serve. Open-weights.

chatcompletionfunction-calling

Kimi K2.7 Code1.0M ctx

Moonshot AI's latest open-source, coding-focused model in the Kimi K2 family, built to complete end-to-end programming tasks reliably over long contexts. A 1-trillion-parameter model that cuts reasoning token usage by roughly 30% versus K2.6 while improving coding and agent performance — +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite for multi-language support. Released under a Modified MIT License and available via Kimi APIs and Hugging Face.

chatcompletionfunction-calling

DiffusionGemma262K ctx

Google DeepMind's experimental diffusion-based member of the Gemma 4 open model family. Unlike autoregressive models that generate text one token at a time, DiffusionGemma denoises a canvas of placeholder tokens to produce up to 256 tokens in parallel, finalizing output in one block. A Mixture-of-Experts model with 26B total parameters and 3.8B active per inference, delivering roughly 4x the throughput of similarly sized autoregressive Gemma models on local hardware. Excels at non-linear tasks like in-line editing, molecular sequencing, mathematical graphing, and self-correcting puzzles.

chatcompletioncode-generation

Claude Fable 5300K ctx

Anthropic's first publicly available Mythos-class model, exceeding the capabilities of any model the company has previously made generally available. State-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, vision, and scientific research. Its lead grows on longer and more complex tasks. Ships with built-in safeguards that route sensitive cybersecurity, biology, chemistry, and distillation queries to Claude Opus 4.8.

chatcompletionfunction-calling

Claude Mythos 5300K ctx

Anthropic's frontier Mythos-class model — the same underlying model as Claude Fable 5 but with safeguards lifted in some areas. It has the strongest cybersecurity capabilities of any model in the world, alongside state-of-the-art performance in software engineering, knowledge work, vision, and scientific research. Access is restricted to a small group of trusted cyberdefenders and infrastructure providers through Project Glasswing.

chatcompletionfunction-calling

Gemma 4 12B262K ctx

Google's medium-size open-weight model with 12 billion parameters from the Gemma 4 family. Encoder-free unified multimodal architecture that natively processes text, image, audio, and video inputs without dedicated encoders. Features a 256K context window and supports 140+ languages. First medium-sized model capable of natively ingesting audio. Suitable for local deployment on GPUs with 16GB VRAM.

chatcompletionvision

Nemotron 3 Ultra1.0M ctx

NVIDIA's flagship open 550B-parameter Mixture-of-Experts model with 55B active parameters, built for frontier reasoning and orchestration in long-running agentic systems. Features hybrid Mamba-Transformer architecture, LatentMoE routing, multi-token prediction, and NVFP4 precision for 5x higher throughput. Achieves 30% lower cost-to-task-completion on agentic benchmarks. Supports 1M+ token context window with 95% accuracy on Ruler@1M.

chatcompletionfunction-calling

MiniMax M31.0M ctx

MiniMax's frontier open-weight model with 1M-token context window, native multimodality (text, image, video), and strong coding capabilities. Built on MiniMax Sparse Attention (MSA) architecture, achieving 59% on SWE-Bench Pro with significantly improved efficiency at long context.

chatcompletionfunction-calling

Claude Opus 4.8300K ctx

Anthropic's most advanced model, building on Opus 4.7 with improvements across benchmarks in coding, agentic skills, reasoning, and knowledge work. Features enhanced honesty, better tool use efficiency, dynamic workflows support, and improved alignment.

chatcompletionfunction-calling

StableLM 2 12B4K ctx

Stability AI's 12.1 billion parameter decoder-only language model pre-trained on 2 trillion tokens of diverse multilingual and code datasets. Supports multiple languages and offers strong performance for its compact size with instruction-tuned chat variant available.

chatcompletioncode-generation

Jamba Large 1.7262K ctx

AI21's latest hybrid SSM-Transformer model with Mixture-of-Experts architecture. Features a 256K context window, improved grounding and instruction-following. 94B total parameters with 398B active, optimized for enterprise long-context tasks.

chatcompletionfunction-calling

Yi-Lightning131K ctx

01.AI's flagship large language model with enhanced Mixture-of-Experts architecture. Ranked 6th on Chatbot Arena with particularly strong results in Chinese, Math, Coding, and Hard Prompts categories. Features advanced expert segmentation and optimized KV-caching.

chatcompletioncode-generation

Snowflake Arctic4K ctx

Snowflake's enterprise-focused open LLM with 480B total parameters using a fine-grained MoE architecture with only 17B active parameters per input. Apache 2.0 licensed, excels at SQL generation, coding, and enterprise intelligence tasks with breakthrough training efficiency.

chatcompletioncode-generation

Databricks' open-source 132B parameter Mixture-of-Experts transformer model with 36B active parameters per input. Released under Databricks Open Model License, optimized for enterprise workloads including SQL generation and coding tasks.

chatcompletioncode-generation

Palmyra X51.0M ctx

Writer's most advanced adaptive reasoning model with a 1 million token context window. Processes full million-token prompts in approximately 22 seconds with multi-turn function calls in 300ms. Optimized for enterprise agentic AI workflows at 3-4x lower cost than GPT-4.1.

chatcompletionfunction-calling

Ring-2.6-1T131K ctx

InclusionAI's (Ant Group) trillion-parameter open-weights reasoning model with 63B active parameters per token. Built for real-world agent workflows with adaptive reasoning-effort modes. Features hybrid linear and MLA attention architecture with MIT license.

chatcompletionfunction-calling

1 2 3 4 5 Next →