LIVE
Models: —+Providers: —+Cheapest H100: $2.49/hrUpdated: 05:15 PMModels: —+Providers: —+Cheapest H100: $2.49/hrUpdated: 05:15 PM
Marketplace
Providers Models
D

DeepInfra

AGGREGATEDINFERENCE
N/A
Uptime
N/A
Rating

30-Day Uptime

100%
2026-03-232026-04-21

Inference Latency

Mistral: Mistral Nemo374ms TTFT · 23 TPS
Meta: Llama 3.1 8B Instruct178ms TTFT · 39 TPS
Meta: Llama 3 8B Instruct297ms TTFT · 25 TPS
OpenAI: gpt-oss-20b296ms TTFT · 51 TPS
OpenAI: gpt-oss-120b322ms TTFT · 36 TPS
OpenAI: gpt-oss-120b (exacto)500ms TTFT · 64 TPS
NVIDIA: Nemotron Nano 9B V2209ms TTFT · 75 TPS
Google: Gemma 3 12B529ms TTFT · 32 TPS
Sao10K: Llama 3 8B Lunaris141ms TTFT · 88 TPS
Google: Gemma 3 4B451ms TTFT · 11 TPS
Mistral: Mistral Small 3293ms TTFT · 63 TPS
NVIDIA: Nemotron 3 Nano 30B A3B1561ms TTFT · 83 TPS
Z.ai: GLM 4.7 Flash475ms TTFT · 70 TPS
Microsoft: Phi 4137ms TTFT · 87 TPS
Qwen: Qwen3 235B A22B Instruct 2507662ms TTFT · 6 TPS
Mistral: Mistral Small 3.2 24B355ms TTFT · 61 TPS
Google: Gemma 3 27B568ms TTFT · 28 TPS
Qwen: Qwen3 30B A3B348ms TTFT · 49 TPS
Meta: Llama 4 Scout339ms TTFT · 38 TPS
Qwen: Qwen3 32B259ms TTFT · 55 TPS

Inference Models

ModelInput $/MOutput $/MTTFTTPS
Mistral: Mistral Nemo$0.02$0.04374ms23
Meta: Llama 3.1 8B Instruct$0.02$0.05178ms39
Meta: Llama 3 8B Instruct$0.03$0.04297ms25
OpenAI: gpt-oss-20b$0.03$0.14296ms51
OpenAI: gpt-oss-120b$0.04$0.19322ms36
OpenAI: gpt-oss-120b (exacto)$0.04$0.19500ms64
NVIDIA: Nemotron Nano 9B V2$0.04$0.16209ms75
Google: Gemma 3 12B$0.04$0.13529ms32
Sao10K: Llama 3 8B Lunaris$0.04$0.05141ms88
Google: Gemma 3 4B$0.04$0.08451ms11
Mistral: Mistral Small 3$0.05$0.08293ms63
NVIDIA: Nemotron 3 Nano 30B A3B$0.05$0.201561ms83
Z.ai: GLM 4.7 Flash$0.06$0.40475ms70
Microsoft: Phi 4$0.07$0.14137ms87
Qwen: Qwen3 235B A22B Instruct 2507$0.07$0.10662ms6
Mistral: Mistral Small 3.2 24B$0.08$0.20355ms61
Google: Gemma 3 27B$0.08$0.16568ms28
Qwen: Qwen3 30B A3B$0.08$0.28348ms49
Meta: Llama 4 Scout$0.08$0.30339ms38
Qwen: Qwen3 32B$0.08$0.28259ms55
Google: Gemma 4 26B A4B $0.08$0.35370ms41
Qwen: Qwen3 Next 80B A3B Instruct$0.09$1.10523ms27
NVIDIA: Nemotron 3 Super$0.10$0.508953ms14
NVIDIA: Llama 3.3 Nemotron Super 49B V1.5$0.10$0.40151ms50
StepFun: Step 3.5 Flash$0.10$0.30477ms43
Qwen2.5 72B Instruct$0.12$0.39401ms34
Qwen: Qwen3 14B$0.12$0.24456ms45
Google: Gemma 4 31B$0.13$0.38716ms15
Qwen: Qwen3 VL 30B A3B Instruct$0.15$0.60302ms38
Meta: Llama 4 Maverick$0.15$0.60387ms19
OpenAI: gpt-oss-120b$0.15$0.60411ms77
Meta: Llama Guard 4 12B$0.18$0.18199ms15
Qwen: Qwen2.5 VL 32B Instruct$0.20$0.60617ms26
DeepSeek: DeepSeek V3 0324$0.20$0.771022ms7
AllenAI: Olmo 3.1 32B Instruct$0.20$0.6014435ms12
Qwen: Qwen3 VL 235B A22B Instruct$0.20$0.882111ms13
NVIDIA: Nemotron Nano 12B 2 VL$0.20$0.60242ms67
DeepSeek: DeepSeek V3.1 Terminus$0.21$0.79712ms15
DeepSeek: DeepSeek V3.1 Terminus (exacto)$0.21$0.791372ms18
DeepSeek: DeepSeek V3.1$0.21$0.792326ms3
Qwen: Qwen3 Coder 480B A35B$0.22$1.00342ms11
Qwen: Qwen3 235B A22B Thinking 2507$0.23$2.30513ms28
Meta: Llama 3.2 11B Vision Instruct$0.25$0.25249ms32
DeepSeek: DeepSeek V3.2$0.26$0.38888ms3
MiniMax: MiniMax M2.5$0.27$0.951026ms20
Z.ai: GLM 4.6V$0.30$0.90397ms126
Nous: Hermes 3 70B Instruct$0.30$0.30830ms20
DeepSeek: DeepSeek V3$0.32$0.89777ms9
MythoMax 13B$0.40$0.40232ms43
Qwen: Qwen3 Coder 480B A35B$0.40$1.60196ms13
Meta: Llama 3.1 70B Instruct$0.40$0.40872ms5
Meta: Llama 3.1 70B Instruct$0.40$0.401173ms7
Z.ai: GLM 4.7$0.40$1.75522ms50
Z.ai: GLM 4.6$0.43$1.74
MoonshotAI: Kimi K2.5$0.45$2.25686ms19
DeepSeek: R1 0528$0.50$2.15542ms36
Mistral: Mixtral 8x7B Instruct$0.54$0.54214ms143
DeepSeek: R1 Distill Llama 70B$0.70$0.80459ms41
Z.ai: GLM 5$0.80$2.5611228ms15
Sao10K: Llama 3.1 Euryale 70B v2.2$0.85$0.85359ms42
Sao10K: Llama 3.3 Euryale 70B$0.85$0.85587ms33
Nous: Hermes 3 405B Instruct$1.00$1.00427ms20
NVIDIA: Llama 3.1 Nemotron 70B Instruct$1.20$1.20155ms19
Z.ai: GLM 5.1$1.40$4.40

Community Reviews

4.5★★★★★(2 reviews)
clouduser42
★★★★★2025-06-15

Reliable service, great API documentation.

mlresearcher
★★★★2025-06-10

Good performance but support could be faster.