Models

Browse 8 canonical LLM models across all providers

8 models

Gemma 4 12B

3 providers262K ctx

Google's medium-size open-weight model with 12 billion parameters from the Gemma 4 family. Encoder-free unified multimodal architecture that natively processes text, image, audio, and video inputs without dedicated encoders. Features a 256K context window and supports 140+ languages. First medium-sized model capable of natively ingesting audio. Suitable for local deployment on GPUs with 16GB VRAM.

$0.00$0.10 / 1M tokens

chatcompletionvisionaudio+3
textimageaudiovideocode

MiniMax M3

3 providers1.0M ctx

MiniMax's frontier open-weight model with 1M-token context window, native multimodality (text, image, video), and strong coding capabilities. Built on MiniMax Sparse Attention (MSA) architecture, achieving 59% on SWE-Bench Pro with significantly improved efficiency at long context.

$0.30$2.40 / 1M tokens

chatcompletionfunction-callingcode-generation+2
textimagevideocode

Gemini 3 Flash

2 providers1.0M ctx

Google's balanced model combining Gemini 3 Pro's reasoning capabilities with the Flash line's latency, efficiency, and cost. Features configurable thinking levels, multimodal function responses, and streaming function calling for complex agentic workflows.

$0.50$3.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiovideocode

Gemini 3.1 Flash-Lite

2 providers1.0M ctx

Google's most cost-efficient Gemini model optimized for high-volume, low-latency use cases. Delivers 2.5x faster time to first token versus Gemini 2.5 Flash with full multimodal support. Ideal for agentic tasks, data extraction, translation, and classification.

$0.25$1.50 / 1M tokens

chatcompletionfunction-callingvision+2
textimageaudiovideocode

MiniCPM-V 4.6

256K ctx

Ultra-efficient multimodal language model from OpenBMB built on SigLIP2-400M and Qwen3.5-0.8B (~1B parameters). Supports single-image, multi-image, and video understanding with mixed 4x/16x visual token compression. Designed for edge deployment on iOS, Android, and HarmonyOS.

chatcompletionvision
textimagevideo

Gemini 3.1 Pro

2 providers2.0M ctx

Google's latest flagship multimodal model with state-of-the-art performance on reasoning, coding, and multimodal understanding. Features native tool use, grounding, and million-token context window.

$7.00$21.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiovideocode

Gemini 2.5 Pro

2 providers1.0M ctx

Google's high-capability reasoning model with adaptive thinking for complex agentic and multimodal challenges. Features 1M token context window and strong performance on coding and scientific tasks.

$2.50$15.00 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiovideocode

Gemini 2.5 Flash

2 providers1.0M ctx

Google's cost-effective model optimized for high throughput tasks. Balances speed and intelligence with strong multimodal capabilities and 1M token context window.

$0.15$0.60 / 1M tokens

chatcompletionfunction-callingvision+3
textimageaudiovideocode