Testing Tiers

Purpose #

Explain the four tiers of tests aidokit runs, what each tier owns, and the coverage targets it must hit.

Big Picture #

From .docs/ARCHITECTURE.md §18 and AGENTS.md §7.4.

Tier	What it tests	Location	Speed
Unit (Vitest)	Pure logic: interpolation, schemas, manifest parsing	`src/foo.test.ts` next to source	Fast
Integration	Real `aidokit <cmd>` in temp dirs; assert emitted tree	`packages/*/test/integration/`	Medium
E2E	Real Claude Code install; `claude /intake` end-to-end	`e2e/` (TBD)	Slow, gated by `pnpm test:e2e`
Byte-compare reference	Adapter output vs hand-built `.claude/` reference	Integration tier (dogfood gate)	Medium

Coverage targets (do not relax without an ADR per AGENTS.md §7.4):

@aidokit/core: 90%+ (pure logic, easy to test)
Adapters: 80%+ (some shell/exec branches harder to test)
@aidokit/cli: 70%+ (interactive paths partially tested)

How It Works #

Unit tests #

Co-located: packages/<pkg>/src/foo.ts ↔ packages/<pkg>/src/foo.test.ts. Pure logic only — no filesystem, no network, no shell. Vitest is the runner (ADR-0001).

Examples already in the repo (see CHANGELOG.md [0.1.0-alpha]): - @aidokit/core: AidoError serialiser, interpolate helper, schema validators. - @aidokit/mcp-catalog: trigger evaluation, install resolver, catalog parity (every entry passes MCPDefSchema). - @aidokit/prereq-check: semver helpers, formatter. - @aidokit/adapter-claude-code: per-emit-method output assertions, MCP install/remove/list, determinism.

Integration tests #

Spin up a temp directory, run a real command (aidokit init …) against it, assert on the emitted file tree, manifest content, exit code. No network; mock CLI tools where needed (Claude Code's claude mcp add is stubbed in v0.1).

Live at packages/*/test/integration/. The flagship example is packages/cli/test/integration/init-emit.test.ts — exercises the real cross-package pipeline (adapter + stack pack + base-skills + shared-docs composed by computeFilePlan, written by writeFilesStaged to a temp dir) and asserts the emitted .claude/ tree structure (7 agents, 3 commands, 19 hook scripts, 6 output styles, 8 schemas, ≥18 skills) plus docs/ and agent-artifacts/ skeletons and plan determinism.

E2E tests #

Only run in CI on the e2e job. Use a real Claude Code install (matrix: macOS, Linux). Bootstrap a temp project, run claude /intake "test", assert intake artifacts are produced. Slow; gated behind pnpm test:e2e.

TODO

Confirm with maintainer — exact location and current status of the E2E lane. The e2e/ directory is TBD per ARCHITECTURE §18.3.

Byte-compare reference tests #

The dogfood gate (see dogfood-gate.md). Integration tests diff the emitted tree against the hand-built .claude/ reference. Intentional changes update the reference; unintentional drift fails CI.

Conformance tests #

A subset of integration tests: every adapter and stack pack runs runAdapterConformance / runStackPackConformance in CI. See concepts/conformance-levels.md.

Example: running tests #

# All packages, all tiers except E2E
pnpm test

# A single package
pnpm --filter @aidokit/adapter-claude-code test

# Just the integration test for the CLI
pnpm --filter aidokit test test/integration/init-emit.test.ts

# E2E (slow, gated)
pnpm test:e2e

Turborepo orders tests after each package's build; cache hits make repeat runs fast.

Diagram #

%%{init: {
  "theme": "base",
  "themeVariables": {
    "fontFamily": "ui-sans-serif, system-ui, -apple-system, Segoe UI, sans-serif",
    "fontSize": "14px",
    "primaryColor": "#eff6ff",
    "primaryTextColor": "#0f172a",
    "primaryBorderColor": "#2563eb",
    "lineColor": "#475569",
    "secondaryColor": "#f1f5f9",
    "tertiaryColor": "#ffffff",
    "clusterBkg": "#f8fafc",
    "clusterBorder": "#cbd5e1"
  }
}}%%
flowchart TD
  classDef actor    fill:#ede9fe,stroke:#6d28d9,color:#1e1b4b,stroke-width:1.2px;
  classDef cli      fill:#dbeafe,stroke:#1d4ed8,color:#0c1f4a,stroke-width:1.4px;
  classDef adapter  fill:#cffafe,stroke:#0e7490,color:#083344;
  classDef pack     fill:#dcfce7,stroke:#15803d,color:#052e16;
  classDef core     fill:#fef9c3,stroke:#a16207,color:#422006;
  classDef artifact fill:#f1f5f9,stroke:#475569,color:#0f172a;
  classDef stop     fill:#fee2e2,stroke:#b91c1c,color:#7f1d1d,stroke-dasharray:4 3;
  classDef ok       fill:#ecfdf5,stroke:#047857,color:#064e3b;
  classDef external fill:#fff7ed,stroke:#c2410c,color:#431407;

  src["src/*.ts"]:::artifact --> unit["Unit (Vitest)
co-located *.test.ts"]:::artifact
  src --> build["pnpm build"]:::cli
  build --> integ["Integration
packages/*/test/integration/
real aidokit init in temp dir"]:::artifact
  build --> conf["Conformance
runAdapterConformance / runStackPackConformance"]:::artifact
  integ --> dog["Byte-compare
diff temp/.claude vs aidokit/.claude"]:::artifact
  build --> e2e["E2E (gated)
real Claude Code install"]:::artifact
  unit --> ci
  integ --> ci
  conf --> ci
  dog --> ci
  e2e --> ci{{"CI gate"}}:::external

Common Mistakes #

Adding filesystem I/O to a unit test. Promote to integration; unit tests are pure-logic only.
Asserting on full file contents in integration tests. That is the byte-compare's job. Integration tests assert on structure (counts, paths, manifest fields).
Skipping conformance in a new adapter or pack package. Conformance failure blocks publish (.docs/docs/specs/conformance-levels.md §9.3).
Reducing the coverage floor without an ADR. The 90/80/70 targets are spec-locked; lowering them needs justification.
Re-running E2E for every PR. It's slow and gated; reserve for the e2e CI job.

Checklist for a new feature #

[ ] Unit tests for the pure logic (Vitest, co-located).
[ ] Integration test if the feature changes emitted output.
[ ] Conformance test passes (added a new check id if your feature is contract-relevant — note that adding checks is a minor spec bump, removing/renaming is breaking).
[ ] Dogfood byte-compare clean if you touched @aidokit/adapter-claude-code or @aidokit/base-skills.
[ ] Coverage floor met or exceeded.

Testing Tiers

Purpose #

Big Picture #

How It Works #

Unit tests #

Integration tests #

E2E tests #

Byte-compare reference tests #

Conformance tests #

Example: running tests #

Diagram #

Common Mistakes #

Checklist for a new feature #

Related Pages #