LIVE
Models: —+Providers: —+Cheapest H100: $2.49/hrUpdated: 04:31 PMModels: —+Providers: —+Cheapest H100: $2.49/hrUpdated: 04:31 PM
Marketplace
Providers Models
D

DeepInfra

AGGREGATEDINFERENCE
N/A
Uptime
N/A
Rating

30-Day Uptime

100%
2026-05-222026-06-20

Inference Latency

Meta: Llama 3.1 8B Instruct294ms TTFT · 28 TPS
Meta: Llama 3.1 8B Instruct286ms TTFT · 21 TPS
Mistral: Mistral Nemo418ms TTFT · 31 TPS
OpenAI: gpt-oss-20b1322ms TTFT · 89 TPS
OpenAI: gpt-oss-120b (exacto)500ms TTFT · 64 TPS
OpenAI: gpt-oss-120b768ms TTFT · 55 TPS
NVIDIA: Nemotron Nano 9B V2244ms TTFT · 119 TPS
Sao10K: Llama 3 8B Lunaris165ms TTFT · 70 TPS
Mistral: Mistral Small 3264ms TTFT · 54 TPS
NVIDIA: Nemotron 3 Nano 30B A3B1289ms TTFT · 81 TPS
Google: Gemma 3 12B481ms TTFT · 37 TPS
Google: Gemma 3 4B608ms TTFT · 20 TPS
Z.ai: GLM 4.7 Flash819ms TTFT · 25 TPS
Microsoft: Phi 4263ms TTFT · 62 TPS
Google: Gemma 4 26B A4B 441ms TTFT · 30 TPS
Mistral: Mistral Small 3.2 24B262ms TTFT · 61 TPS
Google: Gemma 3 27B953ms TTFT · 15 TPS
Qwen: Qwen3 32B276ms TTFT · 31 TPS
Qwen: Qwen3 235B A22B Instruct 2507341ms TTFT · 20 TPS
StepFun: Step 3.5 Flash313ms TTFT · 56 TPS

Inference Models

ModelInput $/MOutput $/MTTFTTPS
Meta: Llama 3.1 8B Instruct$0.02$0.05294ms28
Meta: Llama 3.1 8B Instruct$0.02$0.03286ms21
Mistral: Mistral Nemo$0.02$0.04418ms31
OpenAI: gpt-oss-20b$0.03$0.141322ms89
OpenAI: gpt-oss-120b (exacto)$0.04$0.19500ms64
OpenAI: gpt-oss-120b$0.04$0.19768ms55
NVIDIA: Nemotron Nano 9B V2$0.04$0.16244ms119
Sao10K: Llama 3 8B Lunaris$0.04$0.05165ms70
Mistral: Mistral Small 3$0.05$0.08264ms54
NVIDIA: Nemotron 3 Nano 30B A3B$0.05$0.201289ms81
Google: Gemma 3 12B$0.05$0.15481ms37
Google: Gemma 3 4B$0.05$0.10608ms20
Z.ai: GLM 4.7 Flash$0.06$0.40819ms25
Microsoft: Phi 4$0.07$0.14263ms62
Google: Gemma 4 26B A4B $0.07$0.34441ms30
Mistral: Mistral Small 3.2 24B$0.08$0.20262ms61
Google: Gemma 3 27B$0.08$0.16953ms15
Qwen: Qwen3 32B$0.08$0.28276ms31
Qwen: Qwen3 235B A22B Instruct 2507$0.09$0.10341ms20
StepFun: Step 3.5 Flash$0.09$0.30313ms56
Qwen: Qwen3 Next 80B A3B Instruct$0.09$1.10321ms54
NVIDIA: Nemotron 3 Super$0.10$0.503648ms62
Meta: Llama 3.3 70B Instruct$0.10$0.32516ms11
Meta: Llama 4 Scout$0.10$0.30239ms59
Qwen: Qwen3.5-9B$0.10$0.15577ms29
DeepSeek: DeepSeek V4 Flash$0.10$0.201664ms9
Qwen: Qwen3 14B$0.12$0.241662ms41
Google: Gemma 4 31B$0.12$0.37651ms41
Qwen: Qwen3 30B A3B$0.12$0.50334ms66
Google: Gemma 4 31B$0.13$0.38648ms25
Qwen: Qwen3.5-35B-A3B$0.14$1.00358ms56
MiniMax: MiniMax M2.5$0.15$1.15563ms65
OpenAI: gpt-oss-120b$0.15$0.60385ms224
Meta: Llama 4 Maverick$0.15$0.60282ms29
Qwen: Qwen3 VL 30B A3B Instruct$0.15$0.60379ms27
Meta: Llama Guard 4 12B$0.18$0.18634ms3
DeepSeek: DeepSeek V3 0324$0.20$0.773369ms12
AllenAI: Olmo 3.1 32B Instruct$0.20$0.60586ms37
StepFun: Step 3.7 Flash$0.20$1.15803ms77
NVIDIA: Nemotron Nano 12B 2 VL$0.20$0.60458ms48
Qwen: Qwen2.5 VL 32B Instruct$0.20$0.60617ms26
Qwen: Qwen3 VL 235B A22B Instruct$0.20$0.884949ms16
DeepSeek: DeepSeek V3.1$0.21$0.791127ms7
DeepSeek: DeepSeek V3.1 Terminus (exacto)$0.21$0.791372ms18
Qwen: Qwen3 235B A22B Thinking 2507$0.23$2.30344ms46
MiniMax: MiniMax M2.7$0.25$1.00789ms37
Qwen: Qwen3.5-27B$0.26$2.601168ms25
DeepSeek: DeepSeek V3.2$0.26$0.38676ms10
DeepSeek: DeepSeek V3.1 Terminus$0.27$0.95674ms42
Qwen: Qwen3 Coder 480B A35B$0.30$1.00496ms29
DeepSeek: DeepSeek V3$0.32$0.89427ms34
Qwen: Qwen3.6 27B$0.32$3.20273ms62
Meta: Llama 3.2 11B Vision Instruct$0.35$0.35679ms39
Qwen2.5 72B Instruct$0.36$0.401646ms7
Z.ai: GLM 4.7$0.40$1.75751ms35
NVIDIA: Llama 3.3 Nemotron Super 49B V1.5$0.40$0.40274ms50
MythoMax 13B$0.40$0.40251ms42
Meta: Llama 3.1 70B Instruct$0.40$0.40310ms18
Z.ai: GLM 4.6$0.43$1.74546ms45
MoonshotAI: Kimi K2.5$0.45$2.25683ms72
Qwen: Qwen3.5 397B A17B$0.45$3.00651ms34
DeepSeek: R1 0528$0.50$2.15648ms26
NVIDIA: Nemotron 3 Ultra$0.50$2.208756ms13
MiniMax: MiniMax M2.7$0.50$2.251944ms92
Mistral: Mixtral 8x7B Instruct$0.54$0.54
Z.ai: GLM 5$0.60$2.08976ms40
Nous: Hermes 3 70B Instruct$0.70$0.70452ms28
MoonshotAI: Kimi K2.7 Code$0.74$3.50797ms22
MoonshotAI: Kimi K2.6$0.75$3.50678ms71
Sao10K: Llama 3.1 Euryale 70B v2.2$0.85$0.85300ms52
Xiaomi: MiMo-V2.5-Pro$1.00$3.001005ms45
Nous: Hermes 3 405B Instruct$1.00$1.00398ms22
Z.ai: GLM 5.1$1.05$3.50763ms83
NVIDIA: Llama 3.1 Nemotron 70B Instruct$1.20$1.20
Z.ai: GLM 5.2$1.20$4.201728ms24
DeepSeek: DeepSeek V4 Pro$1.30$2.601098ms27

Community Reviews

4.5★★★★★(2 reviews)
clouduser42
★★★★★2025-06-15

Reliable service, great API documentation.

mlresearcher
★★★★2025-06-10

Good performance but support could be faster.