Testing Tiers
Purpose #
Explain the four tiers of tests aidokit runs, what each tier owns, and the coverage targets it must hit.
Big Picture #
From .docs/ARCHITECTURE.md §18 and CLAUDE.md §7.4.
| Tier | What it tests | Location | Speed |
|---|---|---|---|
| Unit (Vitest) | Pure logic: interpolation, schemas, manifest parsing | src/foo.test.ts next to source |
Fast |
| Integration | Real aidokit <cmd> in temp dirs; assert emitted tree |
packages/*/test/integration/ |
Medium |
| E2E | Real Claude Code install; claude /intake end-to-end |
e2e/ (TBD) |
Slow, gated by pnpm test:e2e |
| Byte-compare reference | Adapter output vs hand-built .claude/ reference |
Integration tier (dogfood gate) | Medium |
Coverage targets (do not relax without an ADR per CLAUDE.md §7.4):
@aidokit/core: 90%+ (pure logic, easy to test)- Adapters: 80%+ (some shell/exec branches harder to test)
@aidokit/cli: 70%+ (interactive paths partially tested)
How It Works #
Unit tests #
Co-located: packages/<pkg>/src/foo.ts ↔ packages/<pkg>/src/foo.test.ts. Pure logic only — no filesystem, no network, no shell. Vitest is the runner (ADR-0001).
Examples already in the repo (see CHANGELOG.md [0.1.0-alpha]):
- @aidokit/core: AidoError serialiser, interpolate helper, schema validators.
- @aidokit/mcp-catalog: trigger evaluation, install resolver, catalog parity (every entry passes MCPDefSchema).
- @aidokit/prereq-check: semver helpers, formatter.
- @aidokit/adapter-claude-code: per-emit-method output assertions, MCP install/remove/list, determinism.
Integration tests #
Spin up a temp directory, run a real command (aidokit init …) against it, assert on the emitted file tree, manifest content, exit code. No network; mock CLI tools where needed (Claude Code's claude mcp add is stubbed in v0.1).
Live at packages/*/test/integration/. The flagship example is packages/cli/test/integration/init-emit.test.ts — exercises the real cross-package pipeline (adapter + stack pack + base-skills + shared-docs composed by computeFilePlan, written by writeFilesStaged to a temp dir) and asserts the emitted .claude/ tree structure (7 agents, 3 commands, 19 hook scripts, 5 output styles, 8 schemas, ≥18 skills) plus docs/ and agent-artifacts/ skeletons and plan determinism.
E2E tests #
Only run in CI on the e2e job. Use a real Claude Code install (matrix: macOS, Linux). Bootstrap a temp project, run claude /intake "test", assert intake artifacts are produced. Slow; gated behind pnpm test:e2e.
Confirm with maintainer — exact location and current status of the E2E lane. The e2e/ directory is TBD per ARCHITECTURE §18.3.
Byte-compare reference tests #
The dogfood gate (see dogfood-gate.md). Integration tests diff the emitted tree against the hand-built .claude/ reference. Intentional changes update the reference; unintentional drift fails CI.
Conformance tests #
A subset of integration tests: every adapter and stack pack runs runAdapterConformance / runStackPackConformance in CI. See concepts/conformance-levels.md.
Example: running tests #
# All packages, all tiers except E2E
pnpm test
# A single package
pnpm --filter @aidokit/adapter-claude-code test
# Just the integration test for the CLI
pnpm --filter aidokit test test/integration/init-emit.test.ts
# E2E (slow, gated)
pnpm test:e2e
Turborepo orders tests after each package's build; cache hits make repeat runs fast.
Diagram #
%%{init: {
"theme": "base",
"themeVariables": {
"fontFamily": "ui-sans-serif, system-ui, -apple-system, Segoe UI, sans-serif",
"fontSize": "14px",
"primaryColor": "#eff6ff",
"primaryTextColor": "#0f172a",
"primaryBorderColor": "#2563eb",
"lineColor": "#475569",
"secondaryColor": "#f1f5f9",
"tertiaryColor": "#ffffff",
"clusterBkg": "#f8fafc",
"clusterBorder": "#cbd5e1"
}
}}%%
flowchart TD
classDef actor fill:#ede9fe,stroke:#6d28d9,color:#1e1b4b,stroke-width:1.2px;
classDef cli fill:#dbeafe,stroke:#1d4ed8,color:#0c1f4a,stroke-width:1.4px;
classDef adapter fill:#cffafe,stroke:#0e7490,color:#083344;
classDef pack fill:#dcfce7,stroke:#15803d,color:#052e16;
classDef core fill:#fef9c3,stroke:#a16207,color:#422006;
classDef artifact fill:#f1f5f9,stroke:#475569,color:#0f172a;
classDef stop fill:#fee2e2,stroke:#b91c1c,color:#7f1d1d,stroke-dasharray:4 3;
classDef ok fill:#ecfdf5,stroke:#047857,color:#064e3b;
classDef external fill:#fff7ed,stroke:#c2410c,color:#431407;
src["src/*.ts"]:::artifact --> unit["Unit (Vitest)
co-located *.test.ts"]:::artifact
src --> build["pnpm build"]:::cli
build --> integ["Integration
packages/*/test/integration/
real aidokit init in temp dir"]:::artifact
build --> conf["Conformance
runAdapterConformance / runStackPackConformance"]:::artifact
integ --> dog["Byte-compare
diff temp/.claude vs aidokit/.claude"]:::artifact
build --> e2e["E2E (gated)
real Claude Code install"]:::artifact
unit --> ci
integ --> ci
conf --> ci
dog --> ci
e2e --> ci{{"CI gate"}}:::external
Common Mistakes #
- Adding filesystem I/O to a unit test. Promote to integration; unit tests are pure-logic only.
- Asserting on full file contents in integration tests. That is the byte-compare's job. Integration tests assert on structure (counts, paths, manifest fields).
- Skipping conformance in a new adapter or pack package. Conformance failure blocks publish (.docs/docs/specs/conformance-levels.md §9.3).
- Reducing the coverage floor without an ADR. The 90/80/70 targets are spec-locked; lowering them needs justification.
- Re-running E2E for every PR. It's slow and gated; reserve for the
e2eCI job.
Checklist for a new feature #
- [ ] Unit tests for the pure logic (Vitest, co-located).
- [ ] Integration test if the feature changes emitted output.
- [ ] Conformance test passes (added a new check id if your feature is contract-relevant — note that adding checks is a minor spec bump, removing/renaming is breaking).
- [ ] Dogfood byte-compare clean if you touched
@aidokit/adapter-claude-codeor@aidokit/base-skills. - [ ] Coverage floor met or exceeded.