Testing Guidelines
Purpose #
How to write and structure tests in this repo. Companion to concepts/testing-tiers.md and how-to-guides/how-to-run-tests-and-byte-compare.md.
Tiers (recap) #
| Tier | What | Where |
|---|---|---|
| Unit | Pure logic | packages/<pkg>/src/foo.test.ts (co-located) |
| Integration | Real aidokit <cmd> in temp dirs |
packages/<pkg>/test/integration/ |
| E2E | Real Claude Code install | e2e/ (gated) |
| Conformance | Harness against the package's exported interface | packages/<pkg>/test/conformance.test.ts |
| Byte-compare | Dogfood .claude/ reference |
packages/cli/test/integration/init-emit.test.ts (+ separate compare step) |
Runner: Vitest. (ADR-0001)
Unit-test rules #
- Co-located with source:
src/foo.ts↔src/foo.test.ts. - No filesystem, no network, no shell. Pure logic only.
- Test names: present-tense behaviour (
"interpolates {{varName}} from vars bag"). - One assertion per test where reasonable; group related assertions in
describe. - Mock at the boundary — never mock
@aidokit/coretypes.
Integration-test rules #
- Spin up a temp directory; run a real command.
- Assert on emitted tree structure (counts, paths, presence) and manifest contents.
- Do not assert on full file contents — that's the byte-compare's job.
- Mock external tools where needed (e.g.
claude mcp addis stubbed in v0.1). - No network calls.
Pattern (from packages/cli/test/integration/init-emit.test.ts):
import { test, expect } from 'vitest';
import { computeFilePlan, writeFilesStaged } from '../../src/init/...';
test('init emits the v4-shaped tree', async () => {
const ctx = await makeFakeProjectContext({ adapter: 'claude-code', stack: ['node-ts'] });
const plan = await computeFilePlan(ctx);
const tmp = await mkdtemp();
await writeFilesStaged(plan, tmp);
expect(await glob(`${tmp}/.claude/agents/*.md`)).toHaveLength(7);
expect(await glob(`${tmp}/.claude/commands/*.md`)).toHaveLength(3);
expect(await glob(`${tmp}/.claude/scripts/*.mjs`)).toHaveLength(19);
// ...
});
Conformance tests #
Every adapter and stack-pack package ships test/conformance.test.ts:
import { test, expect } from 'vitest';
import { runAdapterConformance } from '@aidokit/core';
import { adapter } from '../src/index.js';
test(`${adapter.manifest.name} conforms to declared level`, async () => {
const report = await runAdapterConformance(adapter, adapter.manifest.conformance);
if (report.overall !== 'pass') {
console.error(JSON.stringify(report.results.filter(r => r.status === 'fail'), null, 2));
}
expect(report.overall).toBe('pass');
});
Same pattern with runStackPackConformance for packs. The harness is in @aidokit/core; check ids in reference/conformance-reference.md.
For source-level capability cross-check (AD-STD-CAP-01), pass packagePath:
await runAdapterConformance(adapter, 'standard', { packagePath: path.join(__dirname, '..') });
Dogfood byte-compare #
Live test in CI mirrors the local check from how-to-guides/how-to-run-tests-and-byte-compare.md:
diff -r aidokit/.claude /tmp/dog/.claude
If you touched @aidokit/adapter-claude-code or @aidokit/base-skills, you must keep both sides in sync.
Determinism #
Every test depends on deterministic output. Sources of non-determinism to avoid:
Date.now()/new Date()in emitted content (use injected clock if you need a timestamp anywhere).crypto.randomUUID()/Math.random()in emitted content.- Set / Map iteration order without an explicit sort step.
- File-system enumeration order — always
.sort()paths before assertions.
The harness includes adapter.determinism.repeated-invocation-stable and stack-pack.determinism.repeated-invocation-stable. They catch most leaks.
Coverage #
Targets (CLAUDE.md §7.4):
@aidokit/core≥ 90%- Adapters ≥ 80%
@aidokit/cli≥ 70%
Run pnpm --filter @aidokit/<pkg> test --coverage. Vitest produces coverage/. Lowering a floor needs an ADR.
Fixture conventions #
- Synthetic
ProjectContextandDetectContextcome from@aidokit/core/test-utils(makeProjectContext,makeDetectContext— see CHANGELOG[0.5.0]). - Inline fixtures over separate files when they fit in ~20 lines.
- Named fixtures in conformance-levels spec Appendix A:
greenfield-node-ts,brownfield-node-ts,multi-stack-fullstack,python-django,go-cli,monorepo-mixed,minimum-conformance,strict-conformance.
Common pitfalls #
fs.promises.readdirorder assumed to be alphabetic — it isn't on all platforms. Always sort.- Path separators — use
node:path.join. Forward-slash literals work on POSIX but fail on Windows CI. - Async test leak — every
awaitneeds to settle inside thetest(…)callback; trailing promises trip Vitest's strict mode. - Conformance test running before build — Turborepo orders correctly; if you bypass it, build first.