Models
Browse 71 canonical LLM models across all providers
Nemotron 3 Ultra
NVIDIA's flagship open 550B-parameter Mixture-of-Experts model with 55B active parameters, built for frontier reasoning and orchestration in long-running agentic systems. Features hybrid Mamba-Transformer architecture, LatentMoE routing, multi-token prediction, and NVFP4 precision for 5x higher throughput. Achieves 30% lower cost-to-task-completion on agentic benchmarks. Supports 1M+ token context window with 95% accuracy on Ruler@1M.
$0.00 – $1.60 / 1M tokens
Gemma 4 12B
Google's medium-size open-weight model with 12 billion parameters from the Gemma 4 family. Encoder-free unified multimodal architecture that natively processes text, image, audio, and video inputs without dedicated encoders. Features a 256K context window and supports 140+ languages. First medium-sized model capable of natively ingesting audio. Suitable for local deployment on GPUs with 16GB VRAM.
$0.00 – $0.10 / 1M tokens
MiniMax M3
MiniMax's frontier open-weight model with 1M-token context window, native multimodality (text, image, video), and strong coding capabilities. Built on MiniMax Sparse Attention (MSA) architecture, achieving 59% on SWE-Bench Pro with significantly improved efficiency at long context.
$0.30 – $2.40 / 1M tokens
Claude Opus 4.8
Anthropic's most advanced model, building on Opus 4.7 with improvements across benchmarks in coding, agentic skills, reasoning, and knowledge work. Features enhanced honesty, better tool use efficiency, dynamic workflows support, and improved alignment.
$15.00 – $75.00 / 1M tokens
Jamba Large 1.7
AI21's latest hybrid SSM-Transformer model with Mixture-of-Experts architecture. Features a 256K context window, improved grounding and instruction-following. 94B total parameters with 398B active, optimized for enterprise long-context tasks.
$2.00 – $8.00 / 1M tokens
Ring-2.6-1T
InclusionAI's (Ant Group) trillion-parameter open-weights reasoning model with 63B active parameters per token. Built for real-world agent workflows with adaptive reasoning-effort modes. Features hybrid linear and MLA attention architecture with MIT license.
$0.00 – $0.00 / 1M tokens
Falcon-H1
TII's hybrid Mamba-Transformer model that outperforms comparable offerings from Meta's Llama and Alibaba's Qwen in the 30-70B parameter range. Designed for real-world AI on everyday devices and resource-limited settings with state-of-the-art efficiency.
$0.50 – $2.40 / 1M tokens
Palmyra X5
Writer's most advanced adaptive reasoning model with a 1 million token context window. Processes full million-token prompts in approximately 22 seconds with multi-turn function calls in 300ms. Optimized for enterprise agentic AI workflows at 3-4x lower cost than GPT-4.1.
$5.00 – $15.00 / 1M tokens
Yi-Lightning
01.AI's flagship large language model with enhanced Mixture-of-Experts architecture. Ranked 6th on Chatbot Arena with particularly strong results in Chinese, Math, Coding, and Hard Prompts categories. Features advanced expert segmentation and optimized KV-caching.
$0.99 – $0.99 / 1M tokens
Solar Pro 3
Upstage's powerful Mixture-of-Experts language model with 102B total parameters and 12B active parameters per forward pass. Optimized for Korean with strong English and Japanese support. Excels at complex reasoning, structured output generation, and agentic workflows.
$0.15 – $0.60 / 1M tokens
K2 Think
A 32 billion parameter open-weights reasoning model by LLM360/MBZUAI, built on Qwen2.5-32B. Trained with reinforcement learning and verifiable rewards for long chain-of-thought reasoning, agentic planning, and complex problem solving in math, science, and code.
$0.60 – $0.60 / 1M tokens
Qwen 3.7 Plus
Alibaba's multimodal variant in the Qwen 3.7 family, optimized for vision understanding and multimodal tasks. Ranked
$0.80 – $2.40 / 1M tokens
Qwen 3.7 Max
Alibaba's flagship proprietary model engineered for advanced agentic coding, complex reasoning, and long-horizon task execution. Ranked
$1.30 – $7.80 / 1M tokens
Gemini 3 Flash
Google's balanced model combining Gemini 3 Pro's reasoning capabilities with the Flash line's latency, efficiency, and cost. Features configurable thinking levels, multimodal function responses, and streaming function calling for complex agentic workflows.
$0.50 – $3.00 / 1M tokens
Granite 4.1 30B
IBM's largest dense decoder-only 30B parameter language model from the Granite 4.1 family. Trained on approximately 15T tokens with long-context extension up to 512K tokens. Supports tool calling, RAG, code generation, multilingual tasks across 12 languages. Released under Apache 2.0.
$0.60 – $1.20 / 1M tokens
Laguna M.1
Poolside AI's flagship agentic coding model with 225B total parameters and 23B active (MoE). Trained from scratch in-house on 30T tokens across 6,144 NVIDIA Hopper GPUs. Optimized for complex multi-step software engineering tasks including codebase exploration, file editing, test running, and iterative debugging.
$0.00 – $0.00 / 1M tokens
GPT-5.5
OpenAI's most capable model designed for complex real-world work including coding, online research, information analysis, and document creation. Features advanced agentic capabilities with tool search and multi-step task execution.
$12.00 – $48.00 / 1M tokens
DeepSeek V4 Pro
DeepSeek's flagship V4 model with 1.6T total parameters (49B activated). MoE architecture supporting 1M token context. Closes the gap with frontier proprietary models on reasoning and coding benchmarks.
$0.00 – $2.19 / 1M tokens
Qwen 3.6 27B
Alibaba's dense 27B parameter model that outperforms its own 397B MoE predecessor on agentic coding benchmarks. Strong multilingual and reasoning capabilities released under Apache 2.0.
$0.20 – $0.60 / 1M tokens
MiMo-V2.5-Pro
Xiaomi's flagship 1.02T-parameter Mixture-of-Experts model with 42B active parameters, built on a hybrid-attention architecture with 3-layer Multi-Token Prediction. Designed for complex agentic tasks, software engineering, and long-horizon instruction following with a 1M-token context window.
$1.00 – $3.00 / 1M tokens
Hy3 Preview
Tencent's flagship open-weight Mixture-of-Experts model from the Hunyuan family with 295B total parameters and 21B active. Integrates fast and slow thinking modes with configurable reasoning effort. Designed for agentic workflows, cross-file code refactoring, long-document analysis, and multi-step tool use.
$0.00 – $0.28 / 1M tokens
Qwen 3.6 35B-A3B
Alibaba's efficient Mixture-of-Experts model with 35B total parameters and 3B active per token. Frontier-level agentic coding performance with 73.4% on SWE-bench Verified and 92.7 on AIME 2026. Released under Apache 2.0.
$0.14 – $0.42 / 1M tokens
GPT-5.4 Mini
OpenAI's compact reasoning model optimized for coding, computer use, and subagent tasks. Approaches GPT-5.4 performance on several benchmarks while running more than 2x faster.
$0.75 – $4.50 / 1M tokens
Muse Spark
Meta Superintelligence Labs' first model, featuring advanced reasoning, multimodal understanding, and agentic capabilities. Processes voice, text, and image inputs with tool use and multi-agent orchestration. Powers Meta AI across its product ecosystem.
$5.00 – $25.00 / 1M tokens
Qwen 3.6 Plus
Alibaba's proprietary flagship model in the Qwen 3.6 family, targeting enterprise AI workflows with stronger agentic coding capability, visual coding support, and end-to-end enterprise engineering features.
$0.80 – $2.40 / 1M tokens
Gemma 4 31B
Google's flagship open-weight dense model with 31 billion parameters from the Gemma 4 family. All parameters active per forward pass with top-tier performance on reasoning benchmarks including AIME 2026 and MMLU Pro. Supports vision and extended 256K context window.
Gemma 4 31B
Google's flagship open-weight dense model with 31B parameters. All parameters active per forward pass. Ranks among top open models with strong performance on AIME 2026 (89.2%) and MMLU Pro (85.2%). Supports vision and extended context.
$0.00 – $0.50 / 1M tokens
Gemma 4 26B
Google's high-performance open-weight dense model with 26 billion parameters from the Gemma 4 family. Supports multimodal inputs including text and images with a 256K extended context window. Strong reasoning and code generation capabilities with all parameters active per forward pass.
Claude Opus 4.7
Anthropic's latest and most advanced model with state-of-the-art reasoning, coding, and analysis capabilities. Features improved tool use, extended thinking, and enhanced safety alignment.
$15.00 – $75.00 / 1M tokens
Grok 4.3
xAI's latest and most intelligent model with strong agentic tool calling, minimal hallucinations, and configurable reasoning. Supports 1M token context window with competitive pricing.
$1.25 – $2.50 / 1M tokens
Nemotron 3 Super 120B
NVIDIA's open hybrid Mamba-Transformer MoE model with 120B total parameters (12B active). Features 1M token context window and excels at agentic reasoning, coding, planning, and tool calling.
$0.00 – $0.00 / 1M tokens
GPT-OSS 120B
OpenAI's first open-weight large model with 120 billion parameters. Released under Apache 2.0 license, offering strong performance on reasoning and coding tasks while being fully self-hostable.
$1.80 – $6.00 / 1M tokens
GPT-5.4
OpenAI's frontier reasoning model combining advances in coding, reasoning, and agentic workflows. Features 1.1M token context window and strong performance on complex multi-step problems.
$2.50 – $15.00 / 1M tokens
GLM-5.1
Zhipu AI's latest bilingual model with strong Chinese and English capabilities. Features improved reasoning, coding, and tool use with competitive performance on academic benchmarks.
$0.00 – $3.00 / 1M tokens
Gemini 3.1 Pro
Google's latest flagship multimodal model with state-of-the-art performance on reasoning, coding, and multimodal understanding. Features native tool use, grounding, and million-token context window.
$7.00 – $21.00 / 1M tokens
Mistral Small 4
Mistral AI's efficient hybrid model unifying instruct, reasoning, and coding in a single model. Open-weight under Apache 2.0 with strong performance for its size class.
$0.00 – $0.30 / 1M tokens
GPT-5.5 Pro
OpenAI's premium tier model with extended reasoning capabilities, higher accuracy on complex tasks, and priority access. Optimized for professional and enterprise workloads requiring maximum quality.
$30.00 – $120.00 / 1M tokens
MiniMax M2.7
MiniMax's latest large language model with strong multilingual and multimodal capabilities. Competitive pricing with high-quality text generation and improved reasoning performance.
$0.00 – $1.50 / 1M tokens
Qwen 3.6
Alibaba's latest Qwen model with enhanced reasoning, multilingual capabilities, and improved instruction following. Features strong performance on coding, math, and general knowledge benchmarks.
$0.30 – $0.90 / 1M tokens
Kimi K2.6
Moonshot AI's latest model with ultra-long context window support, strong reasoning capabilities, and excellent performance on complex multi-step tasks. Known for reliable long-document understanding.
$0.00 – $3.00 / 1M tokens
Grok 4
xAI's latest model with real-time information access, strong reasoning capabilities, and competitive performance on coding and analysis tasks. Features improved tool use and multimodal understanding.
$10.00 – $30.00 / 1M tokens
Grok 4.20
xAI's multi-agent capable model with 2M token context window. Available in reasoning, non-reasoning, and multi-agent variants for diverse enterprise workloads.
$1.25 – $2.50 / 1M tokens
Mistral Medium 3.5
Mistral AI's balanced model offering strong multilingual performance with excellent price-performance ratio. Optimized for production workloads requiring reliable quality across European and global languages.
$0.00 – $6.00 / 1M tokens
DeepSeek V4
DeepSeek's fourth-generation model with improved mixture-of-experts architecture, enhanced reasoning and coding capabilities, and stronger multilingual performance. Competitive with frontier proprietary models.
$0.14 – $1.10 / 1M tokens
Claude Opus 4.6
Anthropic's most capable model in the Claude 4 family, excelling at complex analysis, extended reasoning, scientific research, and advanced code generation. Features significantly improved accuracy and reduced hallucinations.
$15.00 – $75.00 / 1M tokens
Claude Sonnet 4.6
Anthropic's balanced model offering strong performance at lower cost and latency than Opus. Excellent for everyday coding, analysis, and content generation tasks with good reasoning capabilities.
$3.00 – $15.00 / 1M tokens
Mistral Large 3
Mistral AI's largest open-weight model with 41B active parameters (675B total MoE). State-of-the-art general-purpose multimodal model with 256K context window and powerful agentic capabilities. Released under Apache 2.0.
$1.80 – $6.00 / 1M tokens
GigaChat 3.1 Ultra
Sber's flagship large-scale Mixture-of-Experts model with 702B total parameters and 36B active. Designed for multilingual assistant workloads, reasoning, code generation, tool use, and large-cluster deployment. Open-weight release.
$3.50 – $10.50 / 1M tokens
Grok 4.1 Fast
xAI's fast and cost-effective model with 2M token context window. Offers both reasoning and non-reasoning modes at significantly lower pricing than flagship models.
$0.20 – $0.50 / 1M tokens
Claude Sonnet 4.5
Anthropic's previous-generation balanced model with strong coding and analysis capabilities. Offers excellent price-performance ratio for production workloads requiring reliable quality.
$3.00 – $15.00 / 1M tokens
Claude Haiku 4.5
Anthropic's fastest model with near-frontier intelligence. Optimized for high-throughput, low-latency applications requiring quick responses at minimal cost. Supports extended thinking.
$0.80 – $5.00 / 1M tokens
GLM-4.7
Zhipu AI's multilingual agentic coding model with strong reasoning, tool use, and UI generation capabilities. Predecessor to GLM-5.1 with competitive performance on coding benchmarks.
$0.00 – $1.50 / 1M tokens
AlemLLM
Kazakhstan's flagship Mixture-of-Experts language model developed by Astana Hub with technical support from 01.AI. Features 247B total parameters with 22B active per token, achieving state-of-the-art results on Kazakh, Russian, and English benchmarks. Outperforms GPT-4o on Kazakh language tasks.
$0.00 – $0.00 / 1M tokens
Trendyol LLM 8B T1
Turkish-optimized 8B chat model developed by Trendyol, Turkey's largest e-commerce platform. Built on Qwen3-8B and fine-tuned on large-scale Turkish e-commerce datasets. Features advanced chain-of-thought reasoning in Turkish with dual operation modes (/think and /no_think), strong instruction following, summarization, coding, and attribute extraction for catalogue enrichment. English reasoning capabilities are preserved alongside Turkish.
$0.00 – $0.00 / 1M tokens
Gemini 2.5 Pro
Google's high-capability reasoning model with adaptive thinking for complex agentic and multimodal challenges. Features 1M token context window and strong performance on coding and scientific tasks.
$2.50 – $15.00 / 1M tokens
Gemini 2.5 Flash
Google's cost-effective model optimized for high throughput tasks. Balances speed and intelligence with strong multimodal capabilities and 1M token context window.
$0.15 – $0.60 / 1M tokens
Nemotron Nano 9B v2
NVIDIA's compact 9B parameter model trained from scratch for both reasoning and non-reasoning tasks. Generates reasoning traces before final responses. Efficient for edge and on-device deployment.
$0.00 – $0.00 / 1M tokens
GPT-5
OpenAI's fifth-generation flagship model with significant improvements in reasoning, multimodal understanding, and code generation. Features enhanced instruction following and expanded context window.
$10.00 – $40.00 / 1M tokens
WiroAI Turkish LLM 9B
Turkish-specialized 9B language model developed by WiroAI, built on Google's Gemma 2 architecture. Fine-tuned with Supervised Fine-Tuning (SFT) on over 500,000 carefully curated high-quality Turkish instructions, specifically adapted to Turkish culture and local context. Demonstrates superior performance on Turkish language processing tasks including conversation, reasoning, and instruction following.
$0.00 – $0.00 / 1M tokens
Llama 4 Maverick
Meta's quality-focused MoE model with 17B active parameters (400B total, 128 experts). Targets quality-critical tasks with benchmark scores competitive with GPT-4o and Gemini 2.5 Pro.
$0.20 – $0.99 / 1M tokens
Qwen3 235B
Alibaba's Qwen3 235B mixture-of-experts model delivering frontier-level performance with advanced reasoning, function calling, and code generation capabilities at massive scale.
Qwen3 32B
Alibaba's Qwen3 32B dense language model with strong reasoning and multilingual capabilities, supporting function calling and code generation across diverse tasks.
$0.16 – $2.24 / 1M tokens
Gemma 3 12B
Google's mid-size open-weight model with 12 billion parameters from the Gemma 3 family. Supports multimodal inputs including text and images with a 128K context window. Strong performance on reasoning and code generation tasks at moderate compute cost.
Gemma 3 27B
Google's largest open-weight model in the Gemma 3 family with 27 billion parameters. Supports multimodal inputs including text and images with a 128K context window. Delivers strong performance across reasoning, code generation, and vision tasks, competitive with larger proprietary models.
QwQ 32B
Alibaba's QwQ 32B reasoning-focused model designed for complex problem solving, mathematical reasoning, and step-by-step logical analysis with strong chain-of-thought capabilities.
Command A
Cohere's flagship 111B parameter model optimized for demanding enterprises requiring fast, secure, and high-quality AI. Excels at RAG, tool use, and multilingual tasks with strong reasoning capabilities.
$2.50 – $10.00 / 1M tokens
DeepSeek R1
DeepSeek's reasoning-focused model trained with reinforcement learning for complex multi-step reasoning. Excels at math, science, and coding problems requiring chain-of-thought reasoning.
$0.40 – $7.00 / 1M tokens
Phi-4
Microsoft's Phi-4 model with 14B parameters excelling at reasoning and code generation tasks, delivering strong performance relative to its compact size with efficient inference characteristics.
DeepSeek V3
DeepSeek's third-generation large language model featuring mixture-of-experts architecture, strong multilingual capabilities, and competitive performance on reasoning and coding benchmarks.
$0.27 – $1.10 / 1M tokens
Aya Expanse 32B
Highly performant 32B multilingual language model from Cohere For AI, designed to rival monolingual model performance across 23 languages. Built using innovations in multilingual data arbitrage, direct preference optimization, and model merging techniques. Outperforms previous multilingual models on both automatic and human evaluations.
Claude 3 Opus
Anthropic's most powerful model in the Claude 3 family, excelling at complex analysis, nuanced content generation, scientific reasoning, and code generation with extended context support.
$15.00 – $75.00 / 1M tokens