Hall of Pain

Real, documented failures of unsupervised AI coding agents. The point of this page is not to dunk on the underlying tools — they are improving fast — but to make the category of failure that aidokit exists to mitigate concrete, named, and searchable.

If you've experienced one of these failure modes in your own work, you are the target user for aidokit. The Standard and Strict tiers exist specifically to prevent the ones below.

Coverage status (v1.x) #

The 25-point analysis behind this page is now fully addressed by v1.x. The table below maps each pain category to the mechanism that mitigates it.

Category	Status	Mechanism
Context window limits	🟢 Strong-at-Standard	Role separation, sub-agents, per-tier memory file
Context poisoning / drift	✅ Strong	`scratchpad-hygiene` + `reset-context` skills (always-on); `doctor --hygiene`
Hallucinated APIs / symbols	✅ Strong	Context7 MCP, Read-before-edit skill, byte-compare gate, zero-LLM scaffolder
Token cost explosion	✅ Strong	`aidokit verify --budget`
Stale documentation	✅ Strong	`aidokit doctor --drift` with semantic reference-graph check
Tech debt from AI sprawl	🟢 Strong-at-Standard	Memory-file anti-patterns; scope discipline
Non-determinism	✅ Strong	Zero-LLM scaffolder; hermetic CI; `AIDOKIT_DETERMINISTIC=1`
Verification gap	🟢 Strong-at-Standard	Tester-Reviewer role; `aidokit verify`; change-summary
Security & supply-chain	✅ Strong	`.aidokit/capabilities.json`; signed packages; five verify facets
Workflow fragmentation	✅ Strong	Three first-party adapters; multi-adapter projects
Prompt injection (MCP)	✅ Strong	`untrustedOutput` flag; `quarantining-untrusted-output` skill; `doctor` check; threat T18
Scope creep	🟢 Strong-at-Standard	Immutable task briefs; memory-file rules
Architectural intent loss	✅ Strong	First-class ADRs (0001–0020)
Onboarding cost	✅ Strong	Per-tier memory file; shared `docs/` skeleton
Skill atrophy	✅ Strong	SP9 hygiene gates (`doctor --hygiene`)
Multi-agent chaos	🟢 Strong-at-Standard	Fixed role contracts; single Maintainer merge
Evaluation difficulty	✅ Strong	`aidokit eval` runs `## Acceptance criteria`
License / IP contamination	✅ Strong	`aidokit verify --license`
Secrets leakage	✅ Strong	`aidokit verify --secrets`; reinforced by `.aidoignore`
Reproducibility across OSes	✅ Strong	CI matrix Win/macOS/Linux; deterministic emission
Model / CLI version drift	✅ Strong	`.aidokit/model.lock`; `doctor --model-drift`
Agent runaway loops	✅ Strong	`aidokit verify --loop-cap`
Code confidentiality	✅ Strong	`.aidoignore`; `@aidokit/core/aidoignore` matcher
Dependency bloat	✅ Strong	`aidokit verify --deps`
AI-code provenance	✅ Strong	Beads task IDs; change-summary; ADRs

Total: 21 ✅ Strong / 4 🟢 Strong-at-Standard / 0 partial / 0 unaddressed.

The 🟢 entries are tier-gated — strongest at Standard/Strict, present-as-convention at Minimum.

How to submit a story #

We collect failures here so future readers (and our own roadmap) see the real shape of the problem. Submission is one of three ways:

GitHub issue — open one tagged hall-of-pain with the template below. The maintainers triage and add to this page within a week.
Mailto — feedback@aidokit.dev, subject line "Hall of Pain: …".
Pull request — edit wiki/hall-of-pain/index.md directly.

Submission template (paste into a GitHub issue, then redact):

Title:        <one-line summary>
Date:         <YYYY-MM-DD>
Tool:         <Claude Code | Cursor | Aider | Copilot | Codex | other>
Failure kind: <scope-leak | context-loss | tool-soup | runaway-loop | other>
Cost:         <hours lost / dollars / story-points / "trust">
What happened:
  <2–4 sentences. Public link if available.>
What `aidokit` would have done:
  <Honest answer. If "nothing useful", say so. We learn from those too.>
Permission:   <Public name OK / Anonymous / Redacted>

We attribute by name only with explicit permission. Anonymous stories are welcome and given equal prominence.

Failure taxonomy #

The stories below are organised by the kind of failure, not by the tool. The same tool can produce different failures in different conditions; the same failure can appear across multiple tools.

Scope leak — agent edits files it wasn't asked to touch #

Symptom: a feature branch contains unrelated formatting / refactor / "I noticed this was wrong" changes across files outside the task's stated scope. PR review takes 5× longer; revert is risky.

What aidokit does about it: the watchdog hook on file write checks the task's declared scope and refuses writes outside it. The agent receives a machine-readable refusal and can either re-plan or escalate the scope explicitly.