Back to providers
Hugging Face Inference
HealthyTelemetry updated 19m agoInference API and dedicated endpoints for open-source models hosted on the Hugging Face Hub. Offers serverless inference for popular models and dedicated GPU endpoints for production workloads.
API Base URL
Authentication
bearer
Uptime (24h)
100.0%
Uptime (7d)
100.0%
Supported Regions
us-east-1eu-west-1
Latency (TTFT)
Time to first token percentiles
No latency data available
Health History
Uptime over the last 7 days
7-Day Uptime100.00% — Excellent
24-Hour Uptime100.00% — Excellent
Current Status
healthy
Last Checked
19m ago
Supported Models (11)
Models available through this provider. Click a model to view details.
| Model | Pricing (per 1M) | Rate Limits | Regions |
|---|---|---|---|
AlemLLM alemllm | In: $0.00 Out: $0.00 | 60 RPM / 300K TPM | us-east-1eu-west-1 |
Alloma 8B Instruct alloma-8b | In: $0.10 Out: $0.10 | 60 RPM / 200K TPM | us-east-1 |
DeepSeek V4 deepseek-v4 | In: $0.18 Out: $0.36 | 300 RPM / 500K TPM | us-east-1eu-west-1 |
Gemma 4 12B gemma-4-12b | In: $0.10 Out: $0.10 | 300 RPM / 500K TPM | us-east-1eu-west-1 |
ISSAI KazLLM 1.0 70B kazllm-1-0-70b | In: $0.00 Out: $0.00 | 60 RPM / 300K TPM | us-east-1eu-west-1 |
Kumru 7B kumru-7b | In: $0.00 Out: $0.00 | 60 RPM / 300K TPM | us-east-1eu-west-1 |
Llama 4 Maverick llama-4-maverick | In: $0.30 Out: $0.30 | 300 RPM / 500K TPM | us-east-1eu-west-1 |
Qwen3 32B qwen3-32b | In: $0.18 Out: $0.18 | 300 RPM / 500K TPM | us-east-1eu-west-1 |
StableLM 2 12B stablelm-2-12b | In: $0.10 Out: $0.10 | 60 RPM / 200K TPM | us-east-1 |
Trendyol LLM 8B T1 trendyol-llm-8b | In: $0.00 Out: $0.00 | 60 RPM / 300K TPM | us-east-1eu-west-1 |
WiroAI Turkish LLM 9B wiroai-turkish-llm-9b | In: $0.00 Out: $0.00 | 60 RPM / 300K TPM | us-east-1eu-west-1 |