pakka / compress / the diet

compress.

Fewer tokens, in and out. Without losing the load-bearing context.

What it does

Token diet, semantically aware.

Long sessions accrete weight: stale tool output, half-typed plans, repeated file dumps. Compress walks the conversation and rewrites it down to the parts the model actually needs to keep going.

in detail

Compress is a turn-level pass. It distinguishes load-bearing spans (decisions, current diff, open spec) from scaffolding (raw file dumps, tool retries, model thinking out loud). Scaffolding gets distilled. Decisions get preserved verbatim and indexed in recall.

A 14-turn session in super-ultra rides at 30–40% of the unmodified token cost, with no measurable drop in code quality.

/01

Lossless on decisions.

Anything that affected a diff is preserved verbatim and re-anchored into recall — never paraphrased away.

/02

Distilled on scaffolding.

Tool retries, raw file dumps, the model's thinking aloud — collapsed into a one-line summary the agent can still reference.

/03

Mode-driven aggressiveness.

lite trims, strict folds more aggressively, ultra condenses prior turns, super-ultra re-summarises and extracts decisions.

/04

Reversible, always.

The full transcript is kept on disk in .pakka/. Restore any turn from the raw JSONL store — useful for debugging or audit.

Modes

Four dials. Pick once.

Mode is a session-level knob. Default is super-ultra. Switch with /pakka:compress <level>.

/ 01 lite

Minimal compression. Trims obvious noise — repeated preambles, stale greetings. Good for very short sessions where you want near-zero behavioural change.

token reduction~27%
per 1M out~$4.05
quality+ within noise
/ 02 strict

Trims redundant tool output and stale file reads. The conservative default for short sessions where you want zero behavioural change.

token reduction~33%
per 1M out~$4.95
qualityidentical
/ 03 ultra

Adds turn-level summarization for prior tool calls and folds repeated file reads. Recommended for sessions over six turns.

token reduction~55%
per 1M out~$8.25
quality+ within noise
One example

Same turn. Different weight.

A real turn from the rate-limit engagement, captured before/after compress. Tool output trimmed; decisions preserved verbatim and indexed.

Without compressturn 6
Input tokens14,820
Output tokens3,142
File-read repeats5
Tool output keptall
Spend$0.064
With compresssuper-ultra
Input tokens5,210 ↓ 65%
Output tokens1,287
File-read repeats0 (folded)
Tool output keptscoped
Spend$0.022