Models

Browse 59 canonical LLM models across all providers

59 models

Solar Pro 3128K ctx

Upstage's powerful Mixture-of-Experts language model with 102B total parameters and 12B active parameters per forward pass. Optimized for Korean with strong English and Japanese support. Excels at complex reasoning, structured output generation, and agentic workflows.

chatcompletionfunction-calling

Qwen 3.7 Max131K ctx

Alibaba's flagship proprietary model engineered for advanced agentic coding, complex reasoning, and long-horizon task execution. Ranked

chatcompletionfunction-calling

Qwen 3.7 Plus131K ctx

Alibaba's multimodal variant in the Qwen 3.7 family, optimized for vision understanding and multimodal tasks. Ranked

chatcompletionfunction-calling

Gemini 3 Flash1.0M ctx

Google's balanced model combining Gemini 3 Pro's reasoning capabilities with the Flash line's latency, efficiency, and cost. Features configurable thinking levels, multimodal function responses, and streaming function calling for complex agentic workflows.

chatcompletionfunction-calling

Gemini 3.1 Flash-Lite1.0M ctx

Google's most cost-efficient Gemini model optimized for high-volume, low-latency use cases. Delivers 2.5x faster time to first token versus Gemini 2.5 Flash with full multimodal support. Ideal for agentic tasks, data extraction, translation, and classification.

chatcompletionfunction-calling

Granite 4.1 30B524K ctx

IBM's largest dense decoder-only 30B parameter language model from the Granite 4.1 family. Trained on approximately 15T tokens with long-context extension up to 512K tokens. Supports tool calling, RAG, code generation, multilingual tasks across 12 languages. Released under Apache 2.0.

chatcompletionfunction-calling

Granite 4.1 8B131K ctx

IBM's dense decoder-only 8B parameter language model from the Granite 4.1 family. Supports 131K-token context, tool calling, RAG, code generation with fill-in-the-middle, text summarization, classification, and extraction across 12 languages. Released under Apache 2.0.

chatcompletionfunction-calling

Laguna M.1128K ctx

Poolside AI's flagship agentic coding model with 225B total parameters and 23B active (MoE). Trained from scratch in-house on 30T tokens across 6,144 NVIDIA Hopper GPUs. Optimized for complex multi-step software engineering tasks including codebase exploration, file editing, test running, and iterative debugging.

chatcompletionfunction-calling

DeepSeek V4 Pro1.0M ctx

DeepSeek's flagship V4 model with 1.6T total parameters (49B activated). MoE architecture supporting 1M token context. Closes the gap with frontier proprietary models on reasoning and coding benchmarks.

chatcompletionfunction-calling

GPT-5.51.0M ctx

OpenAI's most capable model designed for complex real-world work including coding, online research, information analysis, and document creation. Features advanced agentic capabilities with tool search and multi-step task execution.

chatcompletionfunction-calling

DeepSeek V4 Flash1.0M ctx

DeepSeek's efficient V4 model with 284B total parameters (13B activated). Optimized for speed and cost-efficiency while maintaining strong performance. Supports 1M token context window.

chatcompletionfunction-calling

MiMo-V2.5-Pro1.0M ctx

Xiaomi's flagship 1.02T-parameter Mixture-of-Experts model with 42B active parameters, built on a hybrid-attention architecture with 3-layer Multi-Token Prediction. Designed for complex agentic tasks, software engineering, and long-horizon instruction following with a 1M-token context window.

chatcompletionfunction-calling

Qwen 3.6 27B131K ctx

Alibaba's dense 27B parameter model that outperforms its own 397B MoE predecessor on agentic coding benchmarks. Strong multilingual and reasoning capabilities released under Apache 2.0.

chatcompletionfunction-calling

Hy3 Preview256K ctx

Tencent's flagship open-weight Mixture-of-Experts model from the Hunyuan family with 295B total parameters and 21B active. Integrates fast and slow thinking modes with configurable reasoning effort. Designed for agentic workflows, cross-file code refactoring, long-document analysis, and multi-step tool use.

chatcompletionfunction-calling

Qwen 3.6 35B-A3B131K ctx

Alibaba's efficient Mixture-of-Experts model with 35B total parameters and 3B active per token. Frontier-level agentic coding performance with 73.4% on SWE-bench Verified and 92.7 on AIME 2026. Released under Apache 2.0.

chatcompletionfunction-calling

GPT-5.4 Mini1.1M ctx

OpenAI's compact reasoning model optimized for coding, computer use, and subagent tasks. Approaches GPT-5.4 performance on several benchmarks while running more than 2x faster.

chatcompletionfunction-calling

Muse Spark256K ctx

Meta Superintelligence Labs' first model, featuring advanced reasoning, multimodal understanding, and agentic capabilities. Processes voice, text, and image inputs with tool use and multi-agent orchestration. Powers Meta AI across its product ecosystem.

chatcompletionfunction-calling

Qwen 3.6 Plus131K ctx

Alibaba's proprietary flagship model in the Qwen 3.6 family, targeting enterprise AI workflows with stronger agentic coding capability, visual coding support, and end-to-end enterprise engineering features.

chatcompletionfunction-calling

Nemotron 3 Super 120B1.0M ctx

NVIDIA's open hybrid Mamba-Transformer MoE model with 120B total parameters (12B active). Features 1M token context window and excels at agentic reasoning, coding, planning, and tool calling.

chatcompletionfunction-calling

GPT-OSS 120B131K ctx

OpenAI's first open-weight large model with 120 billion parameters. Released under Apache 2.0 license, offering strong performance on reasoning and coding tasks while being fully self-hostable.

chatcompletionfunction-calling

Grok 4.31.0M ctx

xAI's latest and most intelligent model with strong agentic tool calling, minimal hallucinations, and configurable reasoning. Supports 1M token context window with competitive pricing.

chatcompletionfunction-calling

Claude Opus 4.7300K ctx

Anthropic's latest and most advanced model with state-of-the-art reasoning, coding, and analysis capabilities. Features improved tool use, extended thinking, and enhanced safety alignment.

chatcompletionfunction-calling

GPT-OSS 20B131K ctx

OpenAI's compact open-weight model with 20 billion parameters. Released under Apache 2.0 license, designed for efficient deployment on consumer hardware while maintaining strong coding and reasoning capabilities.

chatcompletionfunction-calling

GPT-5.41.1M ctx

OpenAI's frontier reasoning model combining advances in coding, reasoning, and agentic workflows. Features 1.1M token context window and strong performance on complex multi-step problems.

chatcompletionfunction-calling

Qwen 3.6131K ctx

Alibaba's latest Qwen model with enhanced reasoning, multilingual capabilities, and improved instruction following. Features strong performance on coding, math, and general knowledge benchmarks.

chatcompletionfunction-calling

Mistral Small 4128K ctx

Mistral AI's efficient hybrid model unifying instruct, reasoning, and coding in a single model. Open-weight under Apache 2.0 with strong performance for its size class.

chatcompletionfunction-calling

MiniMax M2.7200K ctx

MiniMax's latest large language model with strong multilingual and multimodal capabilities. Competitive pricing with high-quality text generation and improved reasoning performance.

chatcompletionfunction-calling

GPT-5.5 Pro256K ctx

OpenAI's premium tier model with extended reasoning capabilities, higher accuracy on complex tasks, and priority access. Optimized for professional and enterprise workloads requiring maximum quality.

chatcompletionfunction-calling

Gemini 3.1 Pro2.0M ctx

Google's latest flagship multimodal model with state-of-the-art performance on reasoning, coding, and multimodal understanding. Features native tool use, grounding, and million-token context window.

chatcompletionfunction-calling

GLM-5.1131K ctx

Zhipu AI's latest bilingual model with strong Chinese and English capabilities. Features improved reasoning, coding, and tool use with competitive performance on academic benchmarks.

chatcompletionfunction-calling

Kimi K2.61.0M ctx

Moonshot AI's latest model with ultra-long context window support, strong reasoning capabilities, and excellent performance on complex multi-step tasks. Known for reliable long-document understanding.

chatcompletionfunction-calling

DeepSeek V4256K ctx

DeepSeek's fourth-generation model with improved mixture-of-experts architecture, enhanced reasoning and coding capabilities, and stronger multilingual performance. Competitive with frontier proprietary models.

chatcompletionfunction-calling

Grok 4.202.0M ctx

xAI's multi-agent capable model with 2M token context window. Available in reasoning, non-reasoning, and multi-agent variants for diverse enterprise workloads.

chatcompletionfunction-calling

xAI's latest model with real-time information access, strong reasoning capabilities, and competitive performance on coding and analysis tasks. Features improved tool use and multimodal understanding.

chatcompletionfunction-calling

Mistral Medium 3.5128K ctx

Mistral AI's balanced model offering strong multilingual performance with excellent price-performance ratio. Optimized for production workloads requiring reliable quality across European and global languages.

chatcompletionfunction-calling

Claude Sonnet 4.6200K ctx

Anthropic's balanced model offering strong performance at lower cost and latency than Opus. Excellent for everyday coding, analysis, and content generation tasks with good reasoning capabilities.

chatcompletionfunction-calling

Claude Opus 4.6300K ctx

Anthropic's most capable model in the Claude 4 family, excelling at complex analysis, extended reasoning, scientific research, and advanced code generation. Features significantly improved accuracy and reduced hallucinations.

chatcompletionfunction-calling

Mistral Large 3256K ctx

Mistral AI's largest open-weight model with 41B active parameters (675B total MoE). State-of-the-art general-purpose multimodal model with 256K context window and powerful agentic capabilities. Released under Apache 2.0.

chatcompletionfunction-calling

Devstral 2128K ctx

Mistral AI's frontier code agents model designed for solving software engineering tasks. Open-weight model optimized for agentic coding workflows and complex development tasks.

chatcompletionfunction-calling

GigaChat 3.1 Lightning8K ctx

Sber's compact Mixture-of-Experts model with 10B total parameters and 1.8B active. Designed for fast multilingual assistant workloads, reasoning, code, function calling, and product-style deployment on edge devices.

chatcompletionfunction-calling

GigaChat 3.1 Ultra32K ctx

Sber's flagship large-scale Mixture-of-Experts model with 702B total parameters and 36B active. Designed for multilingual assistant workloads, reasoning, code generation, tool use, and large-cluster deployment. Open-weight release.

chatcompletionfunction-calling

Grok 4.1 Fast2.0M ctx

xAI's fast and cost-effective model with 2M token context window. Offers both reasoning and non-reasoning modes at significantly lower pricing than flagship models.

chatcompletionfunction-calling

Claude Sonnet 4.5200K ctx

Anthropic's previous-generation balanced model with strong coding and analysis capabilities. Offers excellent price-performance ratio for production workloads requiring reliable quality.

chatcompletionfunction-calling

Claude Haiku 4.5200K ctx

Anthropic's fastest model with near-frontier intelligence. Optimized for high-throughput, low-latency applications requiring quick responses at minimal cost. Supports extended thinking.

chatcompletionfunction-calling

GLM-4.7131K ctx

Zhipu AI's multilingual agentic coding model with strong reasoning, tool use, and UI generation capabilities. Predecessor to GLM-5.1 with competitive performance on coding benchmarks.

chatcompletionfunction-calling

Gemini 2.5 Pro1.0M ctx

Google's high-capability reasoning model with adaptive thinking for complex agentic and multimodal challenges. Features 1M token context window and strong performance on coding and scientific tasks.

chatcompletionfunction-calling

Gemini 2.5 Flash1.0M ctx

Google's cost-effective model optimized for high throughput tasks. Balances speed and intelligence with strong multimodal capabilities and 1M token context window.

chatcompletionfunction-calling

OpenAI's fifth-generation flagship model with significant improvements in reasoning, multimodal understanding, and code generation. Features enhanced instruction following and expanded context window.

chatcompletionfunction-calling

Llama 4 Maverick1.0M ctx

Meta's quality-focused MoE model with 17B active parameters (400B total, 128 experts). Targets quality-critical tasks with benchmark scores competitive with GPT-4o and Gemini 2.5 Pro.

chatcompletionfunction-calling

Llama 4 Scout10.0M ctx

Meta's efficient MoE model with 17B active parameters (109B total, 16 experts). Supports up to 10M token context — the longest of any production model. Strong performance on reasoning and multilingual tasks.

chatcompletionfunction-calling

Qwen3 32B131K ctx

Alibaba's Qwen3 32B dense language model with strong reasoning and multilingual capabilities, supporting function calling and code generation across diverse tasks.

chatcompletionfunction-calling

Qwen3 235B131K ctx

Alibaba's Qwen3 235B mixture-of-experts model delivering frontier-level performance with advanced reasoning, function calling, and code generation capabilities at massive scale.

chatcompletionfunction-calling

Qwen3 Coder131K ctx

Alibaba's Qwen3 Coder model optimized for software development tasks including code generation, debugging, code review, and technical documentation with strong multilingual programming support.

chatcompletionfunction-calling

Mistral Small 3.1128K ctx

Mistral AI's Small 3.1 model with 24B parameters offering efficient multimodal capabilities including vision, function calling, and code generation with a large 128K context window.

chatcompletionfunction-calling

Command A256K ctx

Cohere's flagship 111B parameter model optimized for demanding enterprises requiring fast, secure, and high-quality AI. Excels at RAG, tool use, and multilingual tasks with strong reasoning capabilities.

chatcompletionfunction-calling

Llama 3.3 70B Instruct131K ctx

Meta's flagship open-weight model with 70 billion parameters. Strong multilingual capabilities with competitive performance on reasoning and coding benchmarks. Available for self-hosting and through various inference providers.

chatcompletionfunction-calling

Command R7B128K ctx

Cohere's compact 7B parameter model optimized for RAG, tool use, and code tasks. Delivers top-tier speed and efficiency on commodity GPUs and edge devices with 128K context window.

chatcompletionfunction-calling

DeepSeek V3128K ctx

DeepSeek's third-generation large language model featuring mixture-of-experts architecture, strong multilingual capabilities, and competitive performance on reasoning and coding benchmarks.

chatcompletionfunction-calling

Llama 3.1 8B Instruct131K ctx

Meta's efficient open-weight model with 8 billion parameters from the Llama 3.1 family. Optimized for instruction following with strong performance on general tasks, coding, and multilingual benchmarks. Ideal for cost-effective deployment and edge inference scenarios.

chatcompletionfunction-calling