N
Nebius
AGGREGATEDINFERENCE
N/A
Uptime
N/A
Rating
30-Day Uptime
96.7%2026-03-232026-04-21
Inference Latency
Meta: Llama 3.1 8B Instruct341ms TTFT · 8 TPS
Qwen: Qwen2.5 Coder 7B Instruct238ms TTFT · 96 TPS
Google: Gemma 2 9B266ms TTFT · 36 TPS
Qwen: Qwen3 32B353ms TTFT · 8 TPS
Google: Gemma 3 27B527ms TTFT · 27 TPS
Qwen: Qwen3 30B A3B Instruct 2507232ms TTFT · 40 TPS
Nous: Hermes 4 70B482ms TTFT · 59 TPS
Meta: Llama 3.3 70B Instruct3834ms TTFT · 12 TPS
Qwen: Qwen3 Next 80B A3B Thinking410ms TTFT · 105 TPS
OpenAI: gpt-oss-120b390ms TTFT · 68 TPS
Prime Intellect: INTELLECT-3189ms TTFT · 128 TPS
Qwen: Qwen2.5 VL 72B Instruct818ms TTFT · 23 TPS
NVIDIA: Nemotron 3 Super1332ms TTFT · 49 TPS
NVIDIA: Llama 3.1 Nemotron Ultra 253B v1320ms TTFT · 38 TPS
Nous: Hermes 4 405B454ms TTFT · 26 TPS
Z.ai: GLM 51368ms TTFT · 39 TPS
Inference Models
| Model | Input $/M | Output $/M | TTFT | TPS |
|---|---|---|---|---|
| Meta: Llama 3.1 8B Instruct | $0.02 | $0.06 | 341ms | 8 |
| Qwen: Qwen2.5 Coder 7B Instruct | $0.03 | $0.09 | 238ms | 96 |
| Google: Gemma 2 9B | $0.03 | $0.09 | 266ms | 36 |
| Qwen: Qwen3 32B | $0.10 | $0.30 | 353ms | 8 |
| Google: Gemma 3 27B | $0.10 | $0.30 | 527ms | 27 |
| Qwen: Qwen3 30B A3B Instruct 2507 | $0.10 | $0.30 | 232ms | 40 |
| Nous: Hermes 4 70B | $0.13 | $0.40 | 482ms | 59 |
| Meta: Llama 3.3 70B Instruct | $0.13 | $0.40 | 3834ms | 12 |
| Qwen: Qwen3 Next 80B A3B Thinking | $0.15 | $1.20 | 410ms | 105 |
| OpenAI: gpt-oss-120b | $0.15 | $0.60 | 390ms | 68 |
| Prime Intellect: INTELLECT-3 | $0.20 | $1.10 | 189ms | 128 |
| Qwen: Qwen2.5 VL 72B Instruct | $0.25 | $0.75 | 818ms | 23 |
| NVIDIA: Nemotron 3 Super | $0.30 | $0.90 | 1332ms | 49 |
| MiniMax: MiniMax M2.5 | $0.30 | $1.20 | — | — |
| NVIDIA: Llama 3.1 Nemotron Ultra 253B v1 | $0.60 | $1.80 | 320ms | 38 |
| Nous: Hermes 4 405B | $1.00 | $3.00 | 454ms | 26 |
| Z.ai: GLM 5 | $1.00 | $3.20 | 1368ms | 39 |
Community Reviews
4.5★★★★★(2 reviews)
clouduser42
★★★★★2025-06-15
Reliable service, great API documentation.
mlresearcher
★★★★☆2025-06-10
Good performance but support could be faster.