AI Cost Curves

When API costs cross the self-hosting line — the economics that drive infrastructure decisions.

The economics explained

Click each model to see the cost dynamics and visual explanation.

Pay-per-token

API Economics

"A taxi — cheap for short trips, ruinous for a daily commute"

API pricing is elegantly simple: you pay per token in, per token out. At low volumes, this is unbeatable — no infrastructure, no GPUs, no ops team. But the cost scales linearly with volume. Double your queries, double your bill. There's no economy of scale, no volume discount that matters at 100K+ queries per day.

At 1,000 queries/day with a frontier model, you're spending roughly $3K-10K/month. Manageable. At 100,000 queries/day, that's $300K-1M/month. At that point, you're not paying for AI — you're paying rent on someone else's GPUs.

The cost ladder

As volume grows, the optimal infrastructure shifts — from API to hybrid to self-hosted.

Infrastructure maturity

Each rung represents a shift in where inference runs and what it costs

API model (low volume)

Under 1K queries/day

$3-10K/mo
API model (medium volume)

1K-10K queries/day

$10-100K/mo
Hybrid: API + self-hosted

Route by complexity

$5-20K/mo
Self-hosted SLM

7B-14B model on GPU

$2-10K/mo
Self-hosted + quantised

4-bit on consumer GPU

<$1K/mo

The crossover point depends on your query volume, latency requirements, and team capability.

Run the calculator API vs Self-Host: Cost Crossover

Find the request volume where self-hosted inference starts to beat per-token API pricing.

Decision framework

Under 1K queries/day?

API. The infrastructure overhead of self-hosting costs more than the tokens.

Between 1K-50K queries/day?

Benchmark. Run a 7B model on a single GPU for a week. Compare cost and quality.

Over 50K queries/day?

Self-host the bulk. Use API for the 5% of queries that need frontier reasoning.

Latency-sensitive (edge inference)?

Self-host with quantised models. API round-trips add 200-500ms you can't optimise away.

Build vs Buy AI Grid