2026-05-26 — All 25 pain points closed by v1.x

Thirteen feature branches merged to main this week. The 25-point pain analysis that drove the v1.x roadmap is now fully addressed in code. This post is the ledger — what landed, where to look, what's still deferred, and what we are not claiming.

The 25-point analysis, briefly #

Earlier this cycle we sat down with the Hall of Pain taxonomy and the field reports we have accumulated and produced a flat list of 25 distinct failure modes that unsupervised AI coding agents reliably trip over: context window limits, context poisoning, hallucinated symbols, token cost explosions, stale docs, runaway loops, secrets leakage, license contamination, prompt injection via MCPs, and so on. Each got an owner mechanism. Most were already covered at Standard or Strict tier; the rest are what shipped this week.

What shipped — sub-project ledger #

Thirteen branches, all merged to main and reflected in CHANGELOG.md [Unreleased]:

P1 — aidokit verify umbrella + secrets facet + four stubs. Pluggable facet runner under one command, sharing .aidokit/policy.json. Ships the --secrets facet implemented (11 bundled patterns + Shannon-entropy fallback + path/inline allowlist) and stubs for the other four. Designed via ADR-0018. Closes #19 (secrets leakage); foundation for #4, #18, #22, #24.
SP6 — MCP untrustedOutput flag + quarantine skill. New optional untrustedOutput: boolean on MCPDef; CLI emits the quarantining-untrusted-output skill when at least one configured MCP is flagged; aidokit doctor emits MCP_QUARANTINE_SKILL_MISSING when the flag is set but the skill is missing. Designed via ADR-0019. Also formalised as threat T18 in security-model.md §7.13. Closes #11 (prompt injection via MCP output).
SP7 — .aidoignore + redaction. Project-level ignore file (and matcher in @aidokit/core/aidoignore) consumed by every place that reads project content for agent context. Reinforces the secrets facet and threat T19 (context-poisoning confidentiality boundary, security-model.md §7.14). Closes #23 (code confidentiality); reinforces #19.
SP8 — .aidokit/model.lock + doctor --model-drift. Lock file records the adapter cliVersion (and friends) at init; doctor --model-drift compares to the currently-declared values and emits MODEL_LOCK_MISSING / MODEL_DRIFT_DETECTED. Closes #21 (model / CLI version drift).
SP9 — scratchpad-hygiene + reset-context skills + doctor --hygiene + task iterations foundation. Two always-on skills (ADR-0020) teach agents how to manage scratchpad context. doctor --hygiene checks scratchpad freshness and brief↔commit drift. The iterations counter in agent-artifacts/<task-id>/state.json is laid down here so SP4 (loop-cap) can read it. Closes #2 (context poisoning); reinforces #15 (skill atrophy).
SP1 — --budget facet implemented. Reads agent-artifacts/metrics/events.jsonl token ledger; converts via the policy usdPerMillionTokens table; compares to maxUsdPerTask. Emits BUDGET_EXCEEDED, BUDGET_NEAR_LIMIT, BUDGET_LEDGER_MISSING. Closes #4 (token cost explosion).
SP2 — --deps facet implemented. Diffs package.json against HEAD; flags new dependencies not in policy.deps.allowedScopes and not justified in a change-summary ## Dependencies section. Closes #24 (dependency bloat).
SP3 — --license facet implemented. SPDX header detection (when requireHeaders: true) plus node_modules license lookup. No network calls. Closes #18 (license / IP contamination).
SP4 — --loop-cap facet implemented. Reads the iterations counter SP9 laid down; emits LOOP_CAP_EXCEEDED and LOOP_CAP_NEAR_LIMIT. Closes #22 (agent runaway loops).
SP5 — aidokit eval command. Runs the ## Acceptance criteria block from a task brief. Manual criteria report MANUAL; shell criteria (- [ ] > <argv-split command>) execute and report PASS/FAIL by exit code. Exit codes: 0 all pass, 1 any fail, 2 no criteria. Closes #17 (evaluation difficulty).
SP10 — semantic doc-drift in doctor --drift. Existing --drift now adds a semantic pass over the doc graph: broken file references, missing symbol references. Opt out with --no-semantic-drift. Promotes #5 (stale docs) from lexical to semantic coverage.

Adapter cross-cutting #

All of the above works identically across the three first-party adapters (claude-code, codex, copilot). The byte-compare dogfood gate runs for all three — packages/cli/test/fixtures/claude-code-reference/, packages/cli/test/fixtures/codex-reference/, and packages/cli/test/fixtures/copilot-reference/ all updated together from the same reference-context.json.

Three skills are now always-on per ADR-0020, emitted at every conformance tier regardless of the adapter:

scratchpad-hygiene
reset-context
quarantining-untrusted-output

computeFilePlan dedupes the always-on set against the normal tier-gated BASE_SKILLS list by id, so an adapter that already emits one of these at its declared tier does not get a duplicate.

What is deferred #

Wave 0 (refactor/doctor-check-registry) — a refactor of packages/cli/src/commands/doctor.ts into a check-registry pattern. It was queued first but not merged, because SP6, SP8, SP9, and SP10 each added inline doctor checks against the pre-refactor shape and we did not want to re-architect under live patches. Planned follow-up: consolidate the inline checks (MCP_QUARANTINE_SKILL_MISSING, the --model-drift group, the --hygiene group, and the semantic doc-drift pass) into the registry pattern in a single PR.

What now exists in the wiki #

Hall of Pain now carries a coverage table mapping each of the 25 categories to its mitigation mechanism.
ADR index lists the three new ADRs (0018, 0019, 0020).
Error codes documents every new finding code introduced by the facets and doctor checks.
CLI reference documents the verify facet flags, aidokit eval, and the new doctor flags (--hygiene, --model-drift, the semantic-drift opt-out).

What is gating GA (unchanged) #

The engineering surface is now done in scope; what remains is human work outside the codebase:

Design-partner recruitment (three target accounts identified, none signed).
Community-maintainer recruitment for the non-canary stack packs (python, react, go).
Published cost-study case with real before/after token spend.
Auditor validation of the SOC2 and EU AI Act mappings produced by aidokit audit export.
Marketing surface (the 90-second video, analyst outreach, the launch-day post).

We will not ship v1.0 GA until those land — but for the first time since the analysis was written, no engineering blocker stands in front of them.