D
DeepInfra
AGGREGATEDINFERENCE
N/A
Uptime
N/A
Rating
30-Day Uptime
100%2026-05-222026-06-20
Inference Latency
Meta: Llama 3.1 8B Instruct294ms TTFT · 28 TPS
Meta: Llama 3.1 8B Instruct286ms TTFT · 21 TPS
Mistral: Mistral Nemo418ms TTFT · 31 TPS
OpenAI: gpt-oss-20b1322ms TTFT · 89 TPS
OpenAI: gpt-oss-120b (exacto)500ms TTFT · 64 TPS
OpenAI: gpt-oss-120b768ms TTFT · 55 TPS
NVIDIA: Nemotron Nano 9B V2244ms TTFT · 119 TPS
Sao10K: Llama 3 8B Lunaris165ms TTFT · 70 TPS
Mistral: Mistral Small 3264ms TTFT · 54 TPS
NVIDIA: Nemotron 3 Nano 30B A3B1289ms TTFT · 81 TPS
Google: Gemma 3 12B481ms TTFT · 37 TPS
Google: Gemma 3 4B608ms TTFT · 20 TPS
Z.ai: GLM 4.7 Flash819ms TTFT · 25 TPS
Microsoft: Phi 4263ms TTFT · 62 TPS
Google: Gemma 4 26B A4B 441ms TTFT · 30 TPS
Mistral: Mistral Small 3.2 24B262ms TTFT · 61 TPS
Google: Gemma 3 27B953ms TTFT · 15 TPS
Qwen: Qwen3 32B276ms TTFT · 31 TPS
Qwen: Qwen3 235B A22B Instruct 2507341ms TTFT · 20 TPS
StepFun: Step 3.5 Flash313ms TTFT · 56 TPS
Inference Models
| Model | Input $/M | Output $/M | TTFT | TPS |
|---|---|---|---|---|
| Meta: Llama 3.1 8B Instruct | $0.02 | $0.05 | 294ms | 28 |
| Meta: Llama 3.1 8B Instruct | $0.02 | $0.03 | 286ms | 21 |
| Mistral: Mistral Nemo | $0.02 | $0.04 | 418ms | 31 |
| OpenAI: gpt-oss-20b | $0.03 | $0.14 | 1322ms | 89 |
| OpenAI: gpt-oss-120b (exacto) | $0.04 | $0.19 | 500ms | 64 |
| OpenAI: gpt-oss-120b | $0.04 | $0.19 | 768ms | 55 |
| NVIDIA: Nemotron Nano 9B V2 | $0.04 | $0.16 | 244ms | 119 |
| Sao10K: Llama 3 8B Lunaris | $0.04 | $0.05 | 165ms | 70 |
| Mistral: Mistral Small 3 | $0.05 | $0.08 | 264ms | 54 |
| NVIDIA: Nemotron 3 Nano 30B A3B | $0.05 | $0.20 | 1289ms | 81 |
| Google: Gemma 3 12B | $0.05 | $0.15 | 481ms | 37 |
| Google: Gemma 3 4B | $0.05 | $0.10 | 608ms | 20 |
| Z.ai: GLM 4.7 Flash | $0.06 | $0.40 | 819ms | 25 |
| Microsoft: Phi 4 | $0.07 | $0.14 | 263ms | 62 |
| Google: Gemma 4 26B A4B | $0.07 | $0.34 | 441ms | 30 |
| Mistral: Mistral Small 3.2 24B | $0.08 | $0.20 | 262ms | 61 |
| Google: Gemma 3 27B | $0.08 | $0.16 | 953ms | 15 |
| Qwen: Qwen3 32B | $0.08 | $0.28 | 276ms | 31 |
| Qwen: Qwen3 235B A22B Instruct 2507 | $0.09 | $0.10 | 341ms | 20 |
| StepFun: Step 3.5 Flash | $0.09 | $0.30 | 313ms | 56 |
| Qwen: Qwen3 Next 80B A3B Instruct | $0.09 | $1.10 | 321ms | 54 |
| NVIDIA: Nemotron 3 Super | $0.10 | $0.50 | 3648ms | 62 |
| Meta: Llama 3.3 70B Instruct | $0.10 | $0.32 | 516ms | 11 |
| Meta: Llama 4 Scout | $0.10 | $0.30 | 239ms | 59 |
| Qwen: Qwen3.5-9B | $0.10 | $0.15 | 577ms | 29 |
| DeepSeek: DeepSeek V4 Flash | $0.10 | $0.20 | 1664ms | 9 |
| Qwen: Qwen3 14B | $0.12 | $0.24 | 1662ms | 41 |
| Google: Gemma 4 31B | $0.12 | $0.37 | 651ms | 41 |
| Qwen: Qwen3 30B A3B | $0.12 | $0.50 | 334ms | 66 |
| Google: Gemma 4 31B | $0.13 | $0.38 | 648ms | 25 |
| Qwen: Qwen3.5-35B-A3B | $0.14 | $1.00 | 358ms | 56 |
| MiniMax: MiniMax M2.5 | $0.15 | $1.15 | 563ms | 65 |
| OpenAI: gpt-oss-120b | $0.15 | $0.60 | 385ms | 224 |
| Meta: Llama 4 Maverick | $0.15 | $0.60 | 282ms | 29 |
| Qwen: Qwen3 VL 30B A3B Instruct | $0.15 | $0.60 | 379ms | 27 |
| Meta: Llama Guard 4 12B | $0.18 | $0.18 | 634ms | 3 |
| DeepSeek: DeepSeek V3 0324 | $0.20 | $0.77 | 3369ms | 12 |
| AllenAI: Olmo 3.1 32B Instruct | $0.20 | $0.60 | 586ms | 37 |
| StepFun: Step 3.7 Flash | $0.20 | $1.15 | 803ms | 77 |
| NVIDIA: Nemotron Nano 12B 2 VL | $0.20 | $0.60 | 458ms | 48 |
| Qwen: Qwen2.5 VL 32B Instruct | $0.20 | $0.60 | 617ms | 26 |
| Qwen: Qwen3 VL 235B A22B Instruct | $0.20 | $0.88 | 4949ms | 16 |
| DeepSeek: DeepSeek V3.1 | $0.21 | $0.79 | 1127ms | 7 |
| DeepSeek: DeepSeek V3.1 Terminus (exacto) | $0.21 | $0.79 | 1372ms | 18 |
| Qwen: Qwen3 235B A22B Thinking 2507 | $0.23 | $2.30 | 344ms | 46 |
| MiniMax: MiniMax M2.7 | $0.25 | $1.00 | 789ms | 37 |
| Qwen: Qwen3.5-27B | $0.26 | $2.60 | 1168ms | 25 |
| DeepSeek: DeepSeek V3.2 | $0.26 | $0.38 | 676ms | 10 |
| DeepSeek: DeepSeek V3.1 Terminus | $0.27 | $0.95 | 674ms | 42 |
| Qwen: Qwen3 Coder 480B A35B | $0.30 | $1.00 | 496ms | 29 |
| DeepSeek: DeepSeek V3 | $0.32 | $0.89 | 427ms | 34 |
| Qwen: Qwen3.6 27B | $0.32 | $3.20 | 273ms | 62 |
| Meta: Llama 3.2 11B Vision Instruct | $0.35 | $0.35 | 679ms | 39 |
| Qwen2.5 72B Instruct | $0.36 | $0.40 | 1646ms | 7 |
| Z.ai: GLM 4.7 | $0.40 | $1.75 | 751ms | 35 |
| NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 | $0.40 | $0.40 | 274ms | 50 |
| MythoMax 13B | $0.40 | $0.40 | 251ms | 42 |
| Meta: Llama 3.1 70B Instruct | $0.40 | $0.40 | 310ms | 18 |
| Z.ai: GLM 4.6 | $0.43 | $1.74 | 546ms | 45 |
| MoonshotAI: Kimi K2.5 | $0.45 | $2.25 | 683ms | 72 |
| Qwen: Qwen3.5 397B A17B | $0.45 | $3.00 | 651ms | 34 |
| DeepSeek: R1 0528 | $0.50 | $2.15 | 648ms | 26 |
| NVIDIA: Nemotron 3 Ultra | $0.50 | $2.20 | 8756ms | 13 |
| MiniMax: MiniMax M2.7 | $0.50 | $2.25 | 1944ms | 92 |
| Mistral: Mixtral 8x7B Instruct | $0.54 | $0.54 | — | — |
| Z.ai: GLM 5 | $0.60 | $2.08 | 976ms | 40 |
| Nous: Hermes 3 70B Instruct | $0.70 | $0.70 | 452ms | 28 |
| MoonshotAI: Kimi K2.7 Code | $0.74 | $3.50 | 797ms | 22 |
| MoonshotAI: Kimi K2.6 | $0.75 | $3.50 | 678ms | 71 |
| Sao10K: Llama 3.1 Euryale 70B v2.2 | $0.85 | $0.85 | 300ms | 52 |
| Xiaomi: MiMo-V2.5-Pro | $1.00 | $3.00 | 1005ms | 45 |
| Nous: Hermes 3 405B Instruct | $1.00 | $1.00 | 398ms | 22 |
| Z.ai: GLM 5.1 | $1.05 | $3.50 | 763ms | 83 |
| NVIDIA: Llama 3.1 Nemotron 70B Instruct | $1.20 | $1.20 | — | — |
| Z.ai: GLM 5.2 | $1.20 | $4.20 | 1728ms | 24 |
| DeepSeek: DeepSeek V4 Pro | $1.30 | $2.60 | 1098ms | 27 |
Community Reviews
4.5★★★★★(2 reviews)
clouduser42
★★★★★2025-06-15
Reliable service, great API documentation.
mlresearcher
★★★★☆2025-06-10
Good performance but support could be faster.