AI Cost Curves

When API costs cross the self-hosting line — the economics that drive infrastructure decisions.

The economics explained

Click each model to see the cost dynamics and visual explanation.

Pay-per-token
API Economics
"A taxi — cheap for short trips, ruinous for a daily commute"
API pricing is elegantly simple: you pay per token in, per token out. At low volumes, this is unbeatable — no infrastructure, no GPUs, no ops team. But the cost scales linearly with volume. Double your queries, double your bill. There's no economy of scale, no volume discount that matters at 100K+ queries per day.
Monthly cost Query volume (per day) 1K 10K 50K 100K API $5K $50K $250K $500K+ Linear: 2x volume = 2x cost
At 1,000 queries/day with a frontier model, you're spending roughly $3K-10K/month. Manageable. At 100,000 queries/day, that's $300K-1M/month. At that point, you're not paying for AI — you're paying rent on someone else's GPUs.

The cost ladder

As volume grows, the optimal infrastructure shifts — from API to hybrid to self-hosted.

Infrastructure maturity

Each rung represents a shift in where inference runs and what it costs

  • API model (low volume)
    Under 1K queries/day
    $3-10K/mo
  • API model (medium volume)
    1K-10K queries/day
    $10-100K/mo
  • Hybrid: API + self-hosted
    Route by complexity
    $5-20K/mo
  • Self-hosted SLM
    7B-14B model on GPU
    $2-10K/mo
  • Self-hosted + quantised
    4-bit on consumer GPU
    <$1K/mo

The crossover point depends on your query volume, latency requirements, and team capability.


Decision framework

Under 1K queries/day?
API. The infrastructure overhead of self-hosting costs more than the tokens.
Between 1K-50K queries/day?
Benchmark. Run a 7B model on a single GPU for a week. Compare cost and quality.
Over 50K queries/day?
Self-host the bulk. Use API for the 5% of queries that need frontier reasoning.
Latency-sensitive (edge inference)?
Self-host with quantised models. API round-trips add 200-500ms you can't optimise away.
Build vs BuyAgentic AI Patterns