Providers
Browse 42 inference providers and their offerings
Alibaba Model Studio
Alibaba Cloud AI platform providing access to Qwen family models and other large language models through DashScope API, offering OpenAI-compatible endpoints with multi-region availability.
Amazon Bedrock
Fully managed AWS service offering foundation models from leading AI companies including Anthropic, Meta, and Mistral through a unified API. Supports OpenAI-compatible endpoints via Mantle inference engine with cross-region inference capabilities.
Anthropic
AI safety company providing API access to the Claude family of models, known for helpfulness, harmlessness, and honesty with strong reasoning and analysis capabilities.
Anyscale
Serverless inference platform built on Ray, offering high-throughput access to popular open-weight models. OpenAI-compatible API with competitive pricing and enterprise-grade scalability.
Azure AI
Microsoft's cloud AI platform providing access to OpenAI models, open-source models, and enterprise AI services through Azure AI Studio. Offers global deployment with enterprise-grade security and compliance.
Baseten
ML infrastructure platform for deploying and serving machine learning models at scale, offering managed GPU inference with auto-scaling and OpenAI-compatible API endpoints for popular LLMs.
Cerebras
Ultra-fast inference provider powered by the Cerebras Wafer Scale Engine, known for extremely high tokens/sec throughput. Offers OpenAI-compatible API with free tier access.
Cloudflare Workers AI
Serverless AI inference platform running on Cloudflare's global edge network. Offers 10,000 neurons/day free allocation with OpenAI-compatible API and wide model selection including vision models.
Cohere
Enterprise AI platform specializing in language models for business applications including RAG, search, and text generation. Offers Command and Embed model families with strong multilingual support.
Deep Infra
Serverless inference platform offering fast and cost-effective access to popular open-weight models. OpenAI-compatible API with pay-per-token pricing and no minimum commitments.
DeepSeek
Chinese AI research company providing direct API access to their DeepSeek family of models with competitive pricing and strong performance on coding and reasoning tasks.
Fireworks
High-speed AI inference platform optimized for low-latency serving of open-source models, offering OpenAI-compatible API endpoints with custom model deployment and fine-tuning capabilities.
Google AI Studio
Google's developer platform for accessing Gemini and Gemma models via OpenAI-compatible API. Free tier available with generous rate limits. Data may be used for training outside EU/EEA/UK/CH regions.
Google Cloud Vertex AI
Google Cloud's enterprise AI platform providing access to Gemini models with enterprise-grade security, compliance, and global infrastructure. Supports streaming, function calling, and multimodal inputs.
Groq
Ultra-fast LPU (Language Processing Unit) inference provider offering extremely low latency. Supports streaming, function calling, and audio transcription via Whisper models. Per-model rate limits apply.
Hugging Face Inference
Inference API and dedicated endpoints for open-source models hosted on the Hugging Face Hub. Offers serverless inference for popular models and dedicated GPU endpoints for production workloads.
Hyperbolic
Decentralized AI compute platform offering affordable GPU inference for open-source models, providing OpenAI-compatible API endpoints with competitive pricing and global availability.
IBM watsonx.ai
IBM's enterprise AI platform providing access to Granite foundation models and third-party models via a REST API. Supports text generation, embeddings, and fine-tuning with regional deployments across IBM Cloud.
Inference.net
Distributed AI inference network providing affordable access to open-source language models through a decentralized GPU marketplace, offering OpenAI-compatible API endpoints with competitive per-token pricing.
Meta AI
Meta's AI research division providing the open-weight Llama family of models. Models are available through various inference providers and can be self-hosted.
MiniMax
Chinese AI company providing large language models with strong multilingual and multimodal capabilities. Known for competitive pricing and high-quality text generation.
Mistral AI
European AI company providing high-performance language models with strong multilingual capabilities. Offers both open-weight and proprietary models through an OpenAI-compatible API.
Modal
Serverless cloud platform for running AI workloads with on-demand GPU access, offering custom model deployment and OpenAI-compatible inference endpoints with automatic scaling and pay-per-second pricing.
Moonshot AI
Chinese AI company behind the Kimi series of models, known for ultra-long context windows and strong reasoning capabilities. Offers OpenAI-compatible API access.
Nebius
Cloud AI platform providing scalable GPU infrastructure and managed inference services for large language models, with data centers in Europe and competitive pricing for open-source model hosting.
NLP Cloud
Production-ready AI inference API offering managed deployment of open-source and proprietary language models with dedicated GPU instances, providing high availability and data privacy compliance.
Novita
AI model inference platform providing affordable access to open-source language models with OpenAI-compatible API endpoints, offering pay-per-token pricing and global availability.
NVIDIA NIM
NVIDIA's inference microservice platform providing optimized deployment of LLMs on GPU infrastructure. Offers free endpoints for select models and partner endpoints through Deep Infra, Together AI, Bitdeer, GMI Cloud, and CoreWeave.
OpenAI
Leading AI research company providing API access to GPT-4, GPT-3.5, DALL-E, and other foundation models through a developer-friendly REST API with global availability.
OpenRouter
Unified API gateway providing access to hundreds of models from multiple providers through a single OpenAI-compatible endpoint. Free models available with shared quota, up to 1000 requests/day.
Perplexity
AI-powered answer engine offering API access to proprietary and open-source models with built-in web search grounding. Specializes in providing accurate, cited responses with real-time information access.
Replicate
Cloud platform for running open-source AI models with a simple API. Hosts over 1000 community models with serverless GPU inference, pay-per-second pricing, and no infrastructure management required.
SambaNova
AI hardware and software platform offering high-performance inference services powered by custom DataScale systems, providing OpenAI-compatible API endpoints for open-source models with industry-leading throughput.
Sber
Russian banking and technology conglomerate providing access to GigaChat family of language models through a dedicated API. Offers models ranging from compact Lightning to flagship Ultra with strong multilingual and reasoning capabilities.
Scaleway
European cloud provider offering managed AI inference endpoints with GPU instances across European data centers, providing OpenAI-compatible API access to popular open-source models.
SiliconFlow
High-performance AI inference platform offering ultra-low latency and cost-effective access to open-source models. Supports models with up to 1M token context windows and OpenAI-compatible API endpoints.
Together AI
Cloud platform for running and fine-tuning open-source AI models, offering competitive pricing and OpenAI-compatible API endpoints for popular open-weight models.
Upstage
South Korean AI company providing enterprise-grade language models optimized for Korean, English, and Japanese. Offers the Solar model family through a direct API with competitive pricing and high throughput.
xAI
AI company founded by Elon Musk providing the Grok family of models. Known for real-time information access and strong reasoning capabilities with OpenAI-compatible API.
Xiaomi MiMo
Xiaomi's AI inference platform providing access to the MiMo family of models via an OpenAI-compatible API endpoint. Offers flagship agentic and multimodal models with competitive pricing.
Yandex Cloud
Russian cloud platform providing access to YandexGPT foundation models through Yandex Cloud AI Studio. Offers OpenAI-compatible API endpoints with strong Russian and English language capabilities.
Zhipu AI
Chinese AI company providing the GLM family of models with strong bilingual (Chinese/English) capabilities. Known for competitive performance on reasoning and coding benchmarks.