We run forty-two AI agents in our Dark Factory every night. Every one of them used to wake up with amnesia, rediscovering facts we already knew, paying the same token bill twice. Here is what we built to stop paying it -- and the eight-stage frame, named by memco.ai, that finally made the story legible.
Builds on Teaching AI Agents to Learn from Their Mistakes and Knowledge Graph Hooks. Maps both into a single lifecycle.
The cold-start tax is a line item, not a metaphor
Last Tuesday an ADF agent spent fourteen thousand tokens working out that our project uses bun, not npm. We had captured that correction three months ago. The correction was sitting in ~/.config/terraphim/, two directories away from where the agent was looking, in a file the agent did not know to read.
memco.ai calls this the cold-start tax: every agent run begins from zero, paying full token cost for facts the team already owns. They name it well. "Context is rented. Memory is owned." Read their field guide; it is short and it is right.
The tax compounds. With forty-two agents running through the night against the same repository, the same rediscovery happens forty-two times. Most teams cope by buying bigger context windows. We tried that. Bigger windows do not pay the tax; they just stretch the receipt.
The second run is the signal
The number that actually matters is the delta between run one and run two of the same task. If memory works, run two costs less. If memory fails, run two costs the same as run one and your "AI strategy" is a slot machine with extra steps.
Here are five tasks from our last seven days of ADF activity, retried within twenty-four hours after a hook-captured correction landed:
| Gitea issue | Run 1 input tokens | Run 2 input tokens | Delta |
|---|---|---|---|
| #1873 | 47,200 | 31,800 | -32.6% |
| #1862 | 28,400 | 19,100 | -32.7% |
| #1879 | 81,000 | 63,500 | -21.6% |
| #1858 | 33,900 | 28,200 | -16.8% |
| #1850 | 52,100 | 47,400 | -9.0% |
Median: 21.6%. Best: 32.7%. Worst: 9%. The 9% case is a fair tell -- that issue's correction sat in a project KG the second-run agent did not consult. We patched the role scope; next attempt should join the others.
We had this data and we were not looking at it. That is the marketing failure underneath the engineering one.
Six green stages, two amber: the eight-stage lifecycle in our stack
memco names eight stages of an agentic memory lifecycle. We checked our stack against it. Six are working. Two are partial. None of them is missing.
| Stage | What memco names | What ships in terraphim-ai today | Status |
|---|---|---|---|
| 1. Capture | Record the lesson | terraphim-agent learn hook via PostToolUse | Green |
| 2. Distill | Synthesise into a shape worth keeping | learn compile, learn export-kg | Green |
| 3. Scope | Attribute the right role / project | Role KGs + per-project KGs under kg/projects/<slug>/ | Green |
| 4. Provenance | Trace it back to where it came from | terraphim-agent sessions (Claude Code, Cursor) | Green |
| 5. Retrieve | Find it again when it matters | Aho-Corasick automata via terraphim_automata | Green |
| 6. Apply | Inject it into the next run | terraphim_hooks at PreToolUse | Green |
| 7. Validate | Check the lesson is still true | /evolve weekly review, judge pipeline | Amber (manual) |
| 8. Retire | Demote the lessons that have aged out | learned-rules.md graduation/demotion | Amber (manual) |
The library code for the amber stages already exists in crates/terraphim_agent_evolution/src/{memory,lessons}.rs. It is not yet wired into the CLI. That is the next two weeks of work.
The Memory Reliability Rubric: a ten-minute readout
We are shipping a six-dimension rubric on top of the existing judge pipeline (Kimi K2.5 deep tier, ninety per cent verdict agreement against our Opus-4.6 oracle). It scores every captured memory on:
- Faithfulness -- does the memory accurately describe what happened?
- Scope -- is the role and project boundary correct?
- Provenance -- can we trace this memory to a session, commit, or hook event?
- Actionability -- does the memory tell the next agent what to do, or only what happened?
- Decay -- is the memory still current, or has the codebase moved past it?
- Risk -- does applying this memory introduce a new failure mode (brittle text substitution, false confidence)?
Output: a single markdown readout with per-dimension scores, the top three offenders, and three recommended retirements. Designed to run in under ten minutes against any project, including ours. The work is tracked in terraphim-ai#1899.
Public commons vs permissioned memory
Not every lesson belongs to the world. We split memory into two buckets, with a policy file at the repo root:
Public commons memory lives in the terraphim-skills repository and as KG terms published via Gitea wiki. Apache-2.0, no PII, no secrets. Anyone can clone, anyone can contribute.
Permissioned memory lives in per-project KGs, per-agent corrections, and session transcripts. It stays in ~/.config/terraphim/ or per-repo. Never leaves the device without an explicit publication step. A scope check in the capture path warns when a permissioned item is about to land in a public location.
This split is what makes the open-source story work without burning operational secrets.
Try it
Install:
cargo install terraphim-agent
terraphim-agent setup
terraphim-agent sessions import
terraphim-agent learn list
The memory namespace lands when #1899 merges. In the meantime, learn and sessions give you six of the eight stages today.
Contribute: the feature request is open and we are happy to receive PRs against any of the six task items in the issue body. The terraphim-skills repository accepts skill contributions under the public commons.
What memco got right, and what we are adding
memco named the category and wrote the field guide. That is real work and we are crediting them for the vocabulary throughout this post. What we are adding is the engineering substrate: a working eight-stage pipeline in Rust, a judge-driven rubric you can run on your own code, and a forty-two-agent fleet that has been paying the tax in production for long enough to measure the relief.
The cold-start tax is real. The second-run signal is measurable. The lifecycle exists in our code today. Read the field guide. Then check your own stack against the eight stages. If you are missing more than two, you are paying tax you do not need to pay.
Discuss on Twitter or open an issue at git.terraphim.cloud/terraphim/terraphim-ai. The Gitea feature request for the consolidated CLI is #1899.