2026-05-26 — All 25 pain points closed by v1.x
Thirteen feature branches merged to main this week. The 25-point pain analysis that drove the v1.x roadmap is now fully addressed in code. This post is the ledger — what landed, where to look, what's still deferred, and what we are not claiming.
The 25-point analysis, briefly #
Earlier this cycle we sat down with the Hall of Pain taxonomy and the field reports we have accumulated and produced a flat list of 25 distinct failure modes that unsupervised AI coding agents reliably trip over: context window limits, context poisoning, hallucinated symbols, token cost explosions, stale docs, runaway loops, secrets leakage, license contamination, prompt injection via MCPs, and so on. Each got an owner mechanism. Most were already covered at Standard or Strict tier; the rest are what shipped this week.
What shipped — sub-project ledger #
Thirteen branches, all merged to main and reflected in
CHANGELOG.md [Unreleased]:
-
P1 —
aidokit verifyumbrella + secrets facet + four stubs. Pluggable facet runner under one command, sharing.aidokit/policy.json. Ships the--secretsfacet implemented (11 bundled patterns + Shannon-entropy fallback + path/inline allowlist) and stubs for the other four. Designed via ADR-0018. Closes #19 (secrets leakage); foundation for #4, #18, #22, #24. -
SP6 — MCP
untrustedOutputflag + quarantine skill. New optionaluntrustedOutput: booleanonMCPDef; CLI emits thequarantining-untrusted-outputskill when at least one configured MCP is flagged;aidokit doctoremitsMCP_QUARANTINE_SKILL_MISSINGwhen the flag is set but the skill is missing. Designed via ADR-0019. Also formalised as threat T18 in security-model.md §7.13. Closes #11 (prompt injection via MCP output). -
SP7 —
.aidoignore+ redaction. Project-level ignore file (and matcher in@aidokit/core/aidoignore) consumed by every place that reads project content for agent context. Reinforces the secrets facet and threat T19 (context-poisoning confidentiality boundary, security-model.md §7.14). Closes #23 (code confidentiality); reinforces #19. -
SP8 —
.aidokit/model.lock+doctor --model-drift. Lock file records the adaptercliVersion(and friends) at init;doctor --model-driftcompares to the currently-declared values and emitsMODEL_LOCK_MISSING/MODEL_DRIFT_DETECTED. Closes #21 (model / CLI version drift). -
SP9 —
scratchpad-hygiene+reset-contextskills +doctor --hygiene+ taskiterationsfoundation. Two always-on skills (ADR-0020) teach agents how to manage scratchpad context.doctor --hygienechecks scratchpad freshness and brief↔commit drift. Theiterationscounter inagent-artifacts/<task-id>/state.jsonis laid down here so SP4 (loop-cap) can read it. Closes #2 (context poisoning); reinforces #15 (skill atrophy). -
SP1 —
--budgetfacet implemented. Readsagent-artifacts/metrics/events.jsonltoken ledger; converts via the policyusdPerMillionTokenstable; compares tomaxUsdPerTask. EmitsBUDGET_EXCEEDED,BUDGET_NEAR_LIMIT,BUDGET_LEDGER_MISSING. Closes #4 (token cost explosion). -
SP2 —
--depsfacet implemented. Diffspackage.jsonagainstHEAD; flags new dependencies not inpolicy.deps.allowedScopesand not justified in a change-summary## Dependenciessection. Closes #24 (dependency bloat). -
SP3 —
--licensefacet implemented. SPDX header detection (whenrequireHeaders: true) plusnode_moduleslicense lookup. No network calls. Closes #18 (license / IP contamination). -
SP4 —
--loop-capfacet implemented. Reads theiterationscounter SP9 laid down; emitsLOOP_CAP_EXCEEDEDandLOOP_CAP_NEAR_LIMIT. Closes #22 (agent runaway loops). -
SP5 —
aidokit evalcommand. Runs the## Acceptance criteriablock from a task brief. Manual criteria reportMANUAL; shell criteria (- [ ] > <argv-split command>) execute and report PASS/FAIL by exit code. Exit codes: 0 all pass, 1 any fail, 2 no criteria. Closes #17 (evaluation difficulty). -
SP10 — semantic doc-drift in
doctor --drift. Existing--driftnow adds a semantic pass over the doc graph: broken file references, missing symbol references. Opt out with--no-semantic-drift. Promotes #5 (stale docs) from lexical to semantic coverage.
Adapter cross-cutting #
All of the above works identically across the three first-party
adapters (claude-code, codex, copilot). The byte-compare
dogfood gate runs for all three —
packages/cli/test/fixtures/v4-reference/,
packages/cli/test/fixtures/codex-reference/, and
packages/cli/test/fixtures/copilot-reference/ all updated together
from the same reference-context.json.
Three skills are now always-on per ADR-0020, emitted at every conformance tier regardless of the adapter:
scratchpad-hygienereset-contextquarantining-untrusted-output
computeFilePlan dedupes the always-on set against the normal
tier-gated BASE_SKILLS list by id, so an adapter that already
emits one of these at its declared tier does not get a duplicate.
What is deferred #
Wave 0 (refactor/doctor-check-registry) — a refactor of
packages/cli/src/commands/doctor.ts into a check-registry pattern.
It was queued first but not merged, because SP6, SP8, SP9, and SP10
each added inline doctor checks against the pre-refactor shape and
we did not want to re-architect under live patches. Planned
follow-up: consolidate the inline checks (MCP_QUARANTINE_SKILL_MISSING,
the --model-drift group, the --hygiene group, and the semantic
doc-drift pass) into the registry pattern in a single PR.
What now exists in the wiki #
- Hall of Pain now carries a coverage table mapping each of the 25 categories to its mitigation mechanism.
- ADR index lists the three new ADRs (0018, 0019, 0020).
- Error codes documents every new finding code introduced by the facets and doctor checks.
- CLI reference documents the
verify facet flags,
aidokit eval, and the newdoctorflags (--hygiene,--model-drift, the semantic-drift opt-out).
What is gating GA (unchanged) #
The engineering surface is now done in scope; what remains is human work outside the codebase:
- Design-partner recruitment (three target accounts identified, none signed).
- Community-maintainer recruitment for the non-canary stack packs
(
python,react,go). - Published cost-study case with real before/after token spend.
- Auditor validation of the SOC2 and EU AI Act mappings produced by
aidokit audit export. - Marketing surface (the 90-second video, analyst outreach, the launch-day post).
We will not ship v1.0 GA until those land — but for the first time since the analysis was written, no engineering blocker stands in front of them.