API vs Self-Host: Cost Crossover

Find the request volume where self-hosted inference starts to beat per-token API pricing — an interactive calculator for AI cost crossover points.

Workload Input tokens / request 4,000 tok Output tokens / request 600 tok Requests / month 250k / mo

API pricing ($ / 1M tokens) Input price $2.5 Output price $10

Frontier API: $2.50 in / $10 out. Mid-tier hosted open weights: ~$0.90 / $0.90. Small/efficient: $0.15 / $0.60.

Self-hosted inference unit Monthly cost / unit ($) $1800 Throughput (tokens / sec) 60 tok/s Realistic utilisation (%) 60% Fixed ops overhead ($/mo) $1500

Throughput examples: 70B model on H100-class ≈ 30–50 tok/s. 7–13B model on RTX 4000 / A10G ≈ 100–180 tok/s.

API cost / month —

Self-host cost / month —

Crossover at —

Units needed —

—

Read the full AI Cost Curves insight →