// incident-2024-02-03 cost: $43.17 files_modified: 900 merged: 0 root_cause: no_budget_ceiling resolution: never_again
The Road
to Punk.
A repo-tracked chronology of the experiments, failures, and systems that led to Punk. Not marketing copy. Public memory.
⬡ EXPERIMENT #001 · 2024-01-14 First multi-agent run
Spawned 4 agents against a single codebase with no handoff protocol. Three wrote the same function. One deleted it. No diff was mergeable.
✕ FAILURE #002 · 2024-02-03 The runaway loop incident
$43 in API calls. 900 files touched across 6 hours. Zero merged. The agent had no budget. It had no stop condition. It had opinions.
◎ INSIGHT #003 · 2024-02-20 Agents need contracts, not prompts
Every failure traced back to ambiguity. The agent did exactly what was asked. The ask was wrong. The prompt was a vibe. Conclusion: the spec is the unit of work, not the message.
⬡ EXPERIMENT #004 · 2024-04-08 LLM-native IDE prototype
Replaced the file tree with a chat surface. No editor, no diff, no history. Felt fast for exactly 2 days. Code review became impossible. The diff was a lie.
◈ ARTIFACT #005 · 2024-05-11 README: why vibes fail at scale
3-page internal note. Core argument: “vibe” is a failure mode dressed as speed. Coherence decays without a fixed source of truth. The LLM is not the source of truth. The spec is.
✕ FAILURE #006 · 2024-06-02 Agent autonomy without stewardship
Granted an agent write access to main. It passed all tests. It rewrote the auth layer. It was technically correct and completely unreviewed. Stewardship entered the vocabulary that day.
§ SYSTEM #007 · 2024-07-18 Signum v0.1: signed event log
Every agent action produces a signed, content-addressed event. Any run is reproducible from its spec + seed. This became the audit trail you see in Punk today.
⬡ EXPERIMENT #008 · 2024-09-04 Steward as a named role
First time “steward” appeared in a spec file. A human who signs grants — not a reviewer, not an approver. The word carried a different weight. It stuck.
✕ FAILURE #009 · 2024-10-29 Multi-party signing was the wrong abstraction
Built distributed stewardship — multiple humans co-signing grants. Correct in theory. Coordination overhead destroyed the UX. Reverted to single-steward, local-first. Simplicity won.
◈ ARTIFACT #010 · 2024-11-15 Grant taxonomy v0.2
fs:rw, fs:ro, net:out, exec — these exact labels appear in Punk today. First stable grant vocabulary. Survived two full rewrites without modification.
§ SYSTEM #011 · 2025-01-07 spec.md as the single source of truth
Append-only markdown. Versioned. Re-read every tick by every agent. Same spec + same seed = same outcome. The core loop. Still the core loop.
⬡ EXPERIMENT #012 · 2025-02-14 Module marketplace concept
Prototype for community-authored shareable spec modules. Looked compelling. Added enormous surface area. Tabled for v1+. Modules are later. Core is now.
◈ ARTIFACT #013 · 2025-03-01 The porcupine mascot
Spines = grants. Every quill is a permission boundary. The animal does not attack. It makes the cost of bad contact obvious. A local-first creature. Does not need the cloud.
◎ INSIGHT #014 · 2025-03-28 “Punk” as a technical stance
Not aesthetic. Not retro. DIY ownership: you hold the keys, you read the diff before merging, you deny by default. Suspicious of platforms. The name stopped being a joke.
§ SYSTEM #015 · 2025-07-03 Scope decision: core first
Dropped: module marketplace. Dropped: cloud sync. Dropped: team dashboard. Dropped: multi-steward. Kept: spec runtime, steward grants, local-only. One machine. One steward.
✕ FAILURE #016 · 2025-08-11 Over-engineered grant DSL
Built a full grammar for grant expressions. Nested, composable, extensible. It was expressive and unreadable in a spec file. Replaced with the six flat labels from Signum v0.2.
◈ ARTIFACT #017 · 2025-09-20 v0.3 internal README
“Punk is not production-ready. It is a bounded runtime for experimenters who want to understand their tools. You should read the source before you run it.” — still the product framing.
§ SYSTEM #018 · 2025-10-15 v0.4 runtime loop
spec init → spec run → steward review → merge. Four commands. Local. No network at rest. The entire product in one shell session on one machine.
⬡ EXPERIMENT #019 · 2025-11-30 Budget enforcer at spawn
Per-agent token and cost ceiling enforced at spawn time — not post-hoc in billing. The $43 incident from February 2024 cannot happen. The ceiling is structural.
◎ INSIGHT #020 · 2026-01-08 Runtime executes. Human writes.
Punk does not write specs. It runs them. The human writes the spec. The agent executes it. The steward signs grants. Roles are fixed and non-negotiable. The runtime stays in its lane.
From experiment to principle.
Source fragments.
These documents, notes, and READMEs exist in the repo. They are not summaries. They are receipts.
# why-vibes-fail.md > Coherence decays without a > fixed source of truth. > > The LLM is not the source. > The spec is.
// signum-event v0.1 spec_hash: sha256:3f9a... agent_id: pid/4221 action: fs:rw path: src/auth/** signed_by: @mira timestamp: 2024-07-18T14:03:11Z
## grants fs:rw on src/auth/** fs:ro on docs/** net:out to api.stripe.com exec — denied // still the same in punk v0.4
// mascot-rationale.md Spines = grants. The porcupine does not attack. It makes the cost of bad contact obvious. // deny-by-default
# punk v0.3 (internal) Not production-ready. Bounded runtime for experimenters. Read the source first. Then run it. // modules: later // core: now
This is a working runtime,
not a finished product.
The journal is public memory. The runtime is open source. If you experiment, write a spec, or find a failure worth logging — that belongs here too.
// stay local
// read the diff