D
DeepInfra
AGGREGATEDINFERENCE
N/A
Uptime
N/A
Rating
30-Day Uptime
100%2026-03-232026-04-21
Inference Latency
Mistral: Mistral Nemo374ms TTFT · 23 TPS
Meta: Llama 3.1 8B Instruct178ms TTFT · 39 TPS
Meta: Llama 3 8B Instruct297ms TTFT · 25 TPS
OpenAI: gpt-oss-20b296ms TTFT · 51 TPS
OpenAI: gpt-oss-120b322ms TTFT · 36 TPS
OpenAI: gpt-oss-120b (exacto)500ms TTFT · 64 TPS
NVIDIA: Nemotron Nano 9B V2209ms TTFT · 75 TPS
Google: Gemma 3 12B529ms TTFT · 32 TPS
Sao10K: Llama 3 8B Lunaris141ms TTFT · 88 TPS
Google: Gemma 3 4B451ms TTFT · 11 TPS
Mistral: Mistral Small 3293ms TTFT · 63 TPS
NVIDIA: Nemotron 3 Nano 30B A3B1561ms TTFT · 83 TPS
Z.ai: GLM 4.7 Flash475ms TTFT · 70 TPS
Microsoft: Phi 4137ms TTFT · 87 TPS
Qwen: Qwen3 235B A22B Instruct 2507662ms TTFT · 6 TPS
Mistral: Mistral Small 3.2 24B355ms TTFT · 61 TPS
Google: Gemma 3 27B568ms TTFT · 28 TPS
Qwen: Qwen3 30B A3B348ms TTFT · 49 TPS
Meta: Llama 4 Scout339ms TTFT · 38 TPS
Qwen: Qwen3 32B259ms TTFT · 55 TPS
Inference Models
| Model | Input $/M | Output $/M | TTFT | TPS |
|---|---|---|---|---|
| Mistral: Mistral Nemo | $0.02 | $0.04 | 374ms | 23 |
| Meta: Llama 3.1 8B Instruct | $0.02 | $0.05 | 178ms | 39 |
| Meta: Llama 3 8B Instruct | $0.03 | $0.04 | 297ms | 25 |
| OpenAI: gpt-oss-20b | $0.03 | $0.14 | 296ms | 51 |
| OpenAI: gpt-oss-120b | $0.04 | $0.19 | 322ms | 36 |
| OpenAI: gpt-oss-120b (exacto) | $0.04 | $0.19 | 500ms | 64 |
| NVIDIA: Nemotron Nano 9B V2 | $0.04 | $0.16 | 209ms | 75 |
| Google: Gemma 3 12B | $0.04 | $0.13 | 529ms | 32 |
| Sao10K: Llama 3 8B Lunaris | $0.04 | $0.05 | 141ms | 88 |
| Google: Gemma 3 4B | $0.04 | $0.08 | 451ms | 11 |
| Mistral: Mistral Small 3 | $0.05 | $0.08 | 293ms | 63 |
| NVIDIA: Nemotron 3 Nano 30B A3B | $0.05 | $0.20 | 1561ms | 83 |
| Z.ai: GLM 4.7 Flash | $0.06 | $0.40 | 475ms | 70 |
| Microsoft: Phi 4 | $0.07 | $0.14 | 137ms | 87 |
| Qwen: Qwen3 235B A22B Instruct 2507 | $0.07 | $0.10 | 662ms | 6 |
| Mistral: Mistral Small 3.2 24B | $0.08 | $0.20 | 355ms | 61 |
| Google: Gemma 3 27B | $0.08 | $0.16 | 568ms | 28 |
| Qwen: Qwen3 30B A3B | $0.08 | $0.28 | 348ms | 49 |
| Meta: Llama 4 Scout | $0.08 | $0.30 | 339ms | 38 |
| Qwen: Qwen3 32B | $0.08 | $0.28 | 259ms | 55 |
| Google: Gemma 4 26B A4B | $0.08 | $0.35 | 370ms | 41 |
| Qwen: Qwen3 Next 80B A3B Instruct | $0.09 | $1.10 | 523ms | 27 |
| NVIDIA: Nemotron 3 Super | $0.10 | $0.50 | 8953ms | 14 |
| NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 | $0.10 | $0.40 | 151ms | 50 |
| StepFun: Step 3.5 Flash | $0.10 | $0.30 | 477ms | 43 |
| Qwen2.5 72B Instruct | $0.12 | $0.39 | 401ms | 34 |
| Qwen: Qwen3 14B | $0.12 | $0.24 | 456ms | 45 |
| Google: Gemma 4 31B | $0.13 | $0.38 | 716ms | 15 |
| Qwen: Qwen3 VL 30B A3B Instruct | $0.15 | $0.60 | 302ms | 38 |
| Meta: Llama 4 Maverick | $0.15 | $0.60 | 387ms | 19 |
| OpenAI: gpt-oss-120b | $0.15 | $0.60 | 411ms | 77 |
| Meta: Llama Guard 4 12B | $0.18 | $0.18 | 199ms | 15 |
| Qwen: Qwen2.5 VL 32B Instruct | $0.20 | $0.60 | 617ms | 26 |
| DeepSeek: DeepSeek V3 0324 | $0.20 | $0.77 | 1022ms | 7 |
| AllenAI: Olmo 3.1 32B Instruct | $0.20 | $0.60 | 14435ms | 12 |
| Qwen: Qwen3 VL 235B A22B Instruct | $0.20 | $0.88 | 2111ms | 13 |
| NVIDIA: Nemotron Nano 12B 2 VL | $0.20 | $0.60 | 242ms | 67 |
| DeepSeek: DeepSeek V3.1 Terminus | $0.21 | $0.79 | 712ms | 15 |
| DeepSeek: DeepSeek V3.1 Terminus (exacto) | $0.21 | $0.79 | 1372ms | 18 |
| DeepSeek: DeepSeek V3.1 | $0.21 | $0.79 | 2326ms | 3 |
| Qwen: Qwen3 Coder 480B A35B | $0.22 | $1.00 | 342ms | 11 |
| Qwen: Qwen3 235B A22B Thinking 2507 | $0.23 | $2.30 | 513ms | 28 |
| Meta: Llama 3.2 11B Vision Instruct | $0.25 | $0.25 | 249ms | 32 |
| DeepSeek: DeepSeek V3.2 | $0.26 | $0.38 | 888ms | 3 |
| MiniMax: MiniMax M2.5 | $0.27 | $0.95 | 1026ms | 20 |
| Z.ai: GLM 4.6V | $0.30 | $0.90 | 397ms | 126 |
| Nous: Hermes 3 70B Instruct | $0.30 | $0.30 | 830ms | 20 |
| DeepSeek: DeepSeek V3 | $0.32 | $0.89 | 777ms | 9 |
| MythoMax 13B | $0.40 | $0.40 | 232ms | 43 |
| Qwen: Qwen3 Coder 480B A35B | $0.40 | $1.60 | 196ms | 13 |
| Meta: Llama 3.1 70B Instruct | $0.40 | $0.40 | 872ms | 5 |
| Meta: Llama 3.1 70B Instruct | $0.40 | $0.40 | 1173ms | 7 |
| Z.ai: GLM 4.7 | $0.40 | $1.75 | 522ms | 50 |
| Z.ai: GLM 4.6 | $0.43 | $1.74 | — | — |
| MoonshotAI: Kimi K2.5 | $0.45 | $2.25 | 686ms | 19 |
| DeepSeek: R1 0528 | $0.50 | $2.15 | 542ms | 36 |
| Mistral: Mixtral 8x7B Instruct | $0.54 | $0.54 | 214ms | 143 |
| DeepSeek: R1 Distill Llama 70B | $0.70 | $0.80 | 459ms | 41 |
| Z.ai: GLM 5 | $0.80 | $2.56 | 11228ms | 15 |
| Sao10K: Llama 3.1 Euryale 70B v2.2 | $0.85 | $0.85 | 359ms | 42 |
| Sao10K: Llama 3.3 Euryale 70B | $0.85 | $0.85 | 587ms | 33 |
| Nous: Hermes 3 405B Instruct | $1.00 | $1.00 | 427ms | 20 |
| NVIDIA: Llama 3.1 Nemotron 70B Instruct | $1.20 | $1.20 | 155ms | 19 |
| Z.ai: GLM 5.1 | $1.40 | $4.40 | — | — |
Community Reviews
4.5★★★★★(2 reviews)
clouduser42
★★★★★2025-06-15
Reliable service, great API documentation.
mlresearcher
★★★★☆2025-06-10
Good performance but support could be faster.