API vs Self-Host: Cost Crossover

Find the request volume where self-hosted inference starts to beat per-token API pricing.

Workload
API pricing ($ / 1M tokens)

Frontier API: $2.50 in / $10 out. Mid-tier hosted open weights: ~$0.90 / $0.90. Small/efficient: $0.15 / $0.60.

Self-hosted inference unit

Throughput examples: 70B model on H100-class ≈ 30–50 tok/s. 7–13B model on RTX 4000 / A10G ≈ 100–180 tok/s.

API cost / month
Self-host cost / month
Crossover at
Units needed