Compression Lab

Compose precision, pruning, distillation, and context length. Watch the footprint, VRAM bill, and hardware tier shift as you go.

01
Compose the stack

Compose techniques. Watch the footprint move.

The numbers below are first-order estimates — a rule-of-thumb model, not a benchmark. Real-world results depend on calibration set, activation outliers, and whether you fine-tune after pruning. Use this to build intuition, not to commit a procurement.

8B parameters
0.5B8B30B70B
INT4
GPTQ / AWQ regime · 4 bits per weight
0%
Fraction of weights set to zero. Above ~50% you typically need fine-tuning to recover.
1× smaller
Train a student with this fraction of the teacher's parameters.
8K tokens
KV-cache scales linearly with context length — it dominates VRAM at long contexts.
Weights on disk
4.0 GB
88% smaller
VRAM required
5.2 GB
incl. KV + activations
Quality retained
94.0%
noticeable drift
Hardware tier
Apple Silicon
8 GB
M-series unified memory, RTX 3060 8GB
Footprint vs FP32 baseline