Models

Browse 6 canonical LLM models across all providers

Sort by

6 models

Google's most cost-efficient Gemini model optimized for high-volume, low-latency use cases. Delivers 2.5x faster time to first token versus Gemini 2.5 Flash with full multimodal support. Ideal for agentic tasks, data extraction, translation, and classification.

chatcompletionfunction-calling

Gemini 3 Flash1.0M ctx

Google's balanced model combining Gemini 3 Pro's reasoning capabilities with the Flash line's latency, efficiency, and cost. Features configurable thinking levels, multimodal function responses, and streaming function calling for complex agentic workflows.

chatcompletionfunction-calling

MiniCPM-V 4.6256K ctx

Ultra-efficient multimodal language model from OpenBMB built on SigLIP2-400M and Qwen3.5-0.8B (~1B parameters). Supports single-image, multi-image, and video understanding with mixed 4x/16x visual token compression. Designed for edge deployment on iOS, Android, and HarmonyOS.

chatcompletionvision

Gemini 3.1 Pro2.0M ctx

Google's latest flagship multimodal model with state-of-the-art performance on reasoning, coding, and multimodal understanding. Features native tool use, grounding, and million-token context window.

chatcompletionfunction-calling

Gemini 2.5 Flash1.0M ctx

Google's cost-effective model optimized for high throughput tasks. Balances speed and intelligence with strong multimodal capabilities and 1M token context window.

chatcompletionfunction-calling

Gemini 2.5 Pro1.0M ctx

Google's high-capability reasoning model with adaptive thinking for complex agentic and multimodal challenges. Features 1M token context window and strong performance on coding and scientific tasks.

chatcompletionfunction-calling