Providers
Browse 49 inference providers and their offerings
01.AI
Chinese AI company founded by Kai-Fu Lee, developing the Yi family of large language models. Offers Yi-Lightning and other models through their API platform with strong performance in Chinese, math, and coding tasks.
AI21 Labs
Israeli AI company specializing in enterprise LLMs with their unique hybrid SSM-Transformer Jamba architecture. Offers models with 256K context windows optimized for grounding, instruction-following, and long-context enterprise tasks.
Alibaba Model Studio
Alibaba Cloud AI platform providing access to Qwen family models and other large language models through DashScope API, offering OpenAI-compatible endpoints with multi-region availability.
Amazon Bedrock
Fully managed AWS service offering foundation models from leading AI companies including Anthropic, Meta, and Mistral through a unified API. Supports OpenAI-compatible endpoints via Mantle inference engine with cross-region inference capabilities.
Anthropic
AI safety company providing API access to the Claude family of models, known for helpfulness, harmlessness, and honesty with strong reasoning and analysis capabilities.
Anyscale
Serverless inference platform built on Ray, offering high-throughput access to popular open-weight models. OpenAI-compatible API with competitive pricing and enterprise-grade scalability.
Azure AI
Microsoft's cloud AI platform providing access to OpenAI models, open-source models, and enterprise AI services through Azure AI Studio. Offers global deployment with enterprise-grade security and compliance.
Baseten
ML infrastructure platform for deploying and serving machine learning models at scale, offering managed GPU inference with auto-scaling and OpenAI-compatible API endpoints for popular LLMs.
Cerebras
Ultra-fast inference provider powered by the Cerebras Wafer Scale Engine, known for extremely high tokens/sec throughput. Offers OpenAI-compatible API with free tier access.
Cloudflare Workers AI
Serverless AI inference platform running on Cloudflare's global edge network. Offers 10,000 neurons/day free allocation with OpenAI-compatible API and wide model selection including vision models.
Cohere
Enterprise AI platform specializing in language models for business applications including RAG, search, and text generation. Offers Command and Embed model families with strong multilingual support.
Deep Infra
Serverless inference platform offering fast and cost-effective access to popular open-weight models. OpenAI-compatible API with pay-per-token pricing and no minimum commitments.
DeepSeek
Chinese AI research company providing direct API access to their DeepSeek family of models with competitive pricing and strong performance on coding and reasoning tasks.
Featherless
Serverless LLM inference platform hosting 20,000+ open-source models from Hugging Face with flat-rate subscription pricing and unlimited token usage. OpenAI-compatible API with no per-token billing — access any model up to a given size based on subscription tier. Largest Hugging Face inference provider by model count.
Fireworks
High-speed AI inference platform optimized for low-latency serving of open-source models, offering OpenAI-compatible API endpoints with custom model deployment and fine-tuning capabilities.
Google AI Studio
Google's developer platform for accessing Gemini and Gemma models via OpenAI-compatible API. Free tier available with generous rate limits. Data may be used for training outside EU/EEA/UK/CH regions.
Google Cloud Vertex AI
Google Cloud's enterprise AI platform providing access to Gemini models with enterprise-grade security, compliance, and global infrastructure. Supports streaming, function calling, and multimodal inputs.
Groq
Ultra-fast LPU (Language Processing Unit) inference provider offering extremely low latency. Supports streaming, function calling, and audio transcription via Whisper models. Per-model rate limits apply.
Hugging Face Inference
Inference API and dedicated endpoints for open-source models hosted on the Hugging Face Hub. Offers serverless inference for popular models and dedicated GPU endpoints for production workloads.
Hyperbolic
Decentralized AI compute platform offering affordable GPU inference for open-source models, providing OpenAI-compatible API endpoints with competitive pricing and global availability.
IBM watsonx.ai
IBM's enterprise AI platform providing access to Granite foundation models and third-party models via a REST API. Supports text generation, embeddings, and fine-tuning with regional deployments across IBM Cloud.
InclusionAI
Ant Group's AI research lab focused on open-source large language models. Offers inference via ZenMux platform with OpenAI-compatible API. Develops the Ling (non-thinking) and Ring (reasoning) model families at trillion-parameter scale.
Inference.net
Distributed AI inference network providing affordable access to open-source language models through a decentralized GPU marketplace, offering OpenAI-compatible API endpoints with competitive per-token pricing.
Lambda
GPU cloud and inference provider offering on-demand access to NVIDIA GPUs for AI training and inference. Provides both cloud instances and managed inference API for open-source LLMs with competitive pricing.
Meta AI
Meta's AI research division providing the open-weight Llama family of models. Models are available through various inference providers and can be self-hosted.
MiniMax
Chinese AI company providing large language models with strong multilingual and multimodal capabilities. Known for competitive pricing and high-quality text generation.
Mistral AI
European AI company providing high-performance language models with strong multilingual capabilities. Offers both open-weight and proprietary models through an OpenAI-compatible API.
Modal
Serverless cloud platform for running AI workloads with on-demand GPU access, offering custom model deployment and OpenAI-compatible inference endpoints with automatic scaling and pay-per-second pricing.
Moonshot AI
Chinese AI company behind the Kimi series of models, known for ultra-long context windows and strong reasoning capabilities. Offers OpenAI-compatible API access.
Nebius
Cloud AI platform providing scalable GPU infrastructure and managed inference services for large language models, with data centers in Europe and competitive pricing for open-source model hosting.
NLP Cloud
Production-ready AI inference API offering managed deployment of open-source and proprietary language models with dedicated GPU instances, providing high availability and data privacy compliance.
Novita
AI model inference platform providing affordable access to open-source language models with OpenAI-compatible API endpoints, offering pay-per-token pricing and global availability.
NVIDIA NIM
NVIDIA's inference microservice platform providing optimized deployment of LLMs on GPU infrastructure. Offers free endpoints for select models and partner endpoints through Deep Infra, Together AI, Bitdeer, GMI Cloud, and CoreWeave.
OpenAI
Leading AI research company providing API access to GPT-4, GPT-3.5, DALL-E, and other foundation models through a developer-friendly REST API with global availability.
OpenRouter
Unified API gateway providing access to hundreds of models from multiple providers through a single OpenAI-compatible endpoint. Free models available with shared quota, up to 1000 requests/day.
Perplexity
AI-powered answer engine offering API access to proprietary and open-source models with built-in web search grounding. Specializes in providing accurate, cited responses with real-time information access.
Reka AI
AI research company building multimodal language models that process text, images, video, and audio in a single model. Offers Flash (21B) and Edge (7B) models optimized for reasoning, coding, and physical AI applications.
Replicate
Cloud platform for running open-source AI models with a simple API. Hosts over 1000 community models with serverless GPU inference, pay-per-second pricing, and no infrastructure management required.
SambaNova
AI hardware and software platform offering high-performance inference services powered by custom DataScale systems, providing OpenAI-compatible API endpoints for open-source models with industry-leading throughput.
Sber
Russian banking and technology conglomerate providing access to GigaChat family of language models through a dedicated API. Offers models ranging from compact Lightning to flagship Ultra with strong multilingual and reasoning capabilities.
Scaleway
European cloud provider offering managed AI inference endpoints with GPU instances across European data centers, providing OpenAI-compatible API access to popular open-source models.
SiliconFlow
High-performance AI inference platform offering ultra-low latency and cost-effective access to open-source models. Supports models with up to 1M token context windows and OpenAI-compatible API endpoints.
Snowflake Cortex AI
Snowflake's managed AI service providing access to Arctic family models and other LLMs through Cortex AI. Integrated with Snowflake's data platform for enterprise SQL generation, document understanding, and AI-powered analytics.
Together AI
Cloud platform for running and fine-tuning open-source AI models, offering competitive pricing and OpenAI-compatible API endpoints for popular open-weight models.
Upstage
South Korean AI company providing enterprise-grade language models optimized for Korean, English, and Japanese. Offers the Solar model family through a direct API with competitive pricing and high throughput.
xAI
AI company founded by Elon Musk providing the Grok family of models. Known for real-time information access and strong reasoning capabilities with OpenAI-compatible API.
Xiaomi MiMo
Xiaomi's AI inference platform providing access to the MiMo family of models via an OpenAI-compatible API endpoint. Offers flagship agentic and multimodal models with competitive pricing.
Yandex Cloud
Russian cloud platform providing access to YandexGPT foundation models through Yandex Cloud AI Studio. Offers OpenAI-compatible API endpoints with strong Russian and English language capabilities.
Zhipu AI
Chinese AI company providing the GLM family of models with strong bilingual (Chinese/English) capabilities. Known for competitive performance on reasoning and coding benchmarks.