Testing Guidelines

Purpose #

How to write and structure tests in this repo. Companion to concepts/testing-tiers.md and how-to-guides/how-to-run-tests-and-byte-compare.md.

Tiers (recap) #

Tier	What	Where
Unit	Pure logic	`packages/<pkg>/src/foo.test.ts` (co-located)
Integration	Real `aidokit <cmd>` in temp dirs	`packages/<pkg>/test/integration/`
E2E	Real Claude Code install	`e2e/` (gated)
Conformance	Harness against the package's exported interface	`packages/<pkg>/test/conformance.test.ts`
Byte-compare	Dogfood `.claude/` reference	`packages/cli/test/integration/init-emit.test.ts` (+ separate compare step)

Runner: Vitest. (ADR-0001)

Unit-test rules #

Co-located with source: src/foo.ts ↔ src/foo.test.ts.
No filesystem, no network, no shell. Pure logic only.
Test names: present-tense behaviour ("interpolates {{varName}} from vars bag").
One assertion per test where reasonable; group related assertions in describe.
Mock at the boundary — never mock @aidokit/core types.

Integration-test rules #

Spin up a temp directory; run a real command.
Assert on emitted tree structure (counts, paths, presence) and manifest contents.
Do not assert on full file contents — that's the byte-compare's job.
Mock external tools where needed (e.g. claude mcp add is stubbed in v0.1).
No network calls.

Pattern (from packages/cli/test/integration/init-emit.test.ts):

import { test, expect } from 'vitest';
import { computeFilePlan, writeFilesStaged } from '../../src/init/...';

test('init emits the v4-shaped tree', async () => {
  const ctx = await makeFakeProjectContext({ adapter: 'claude-code', stack: ['node-ts'] });
  const plan = await computeFilePlan(ctx);
  const tmp = await mkdtemp();
  await writeFilesStaged(plan, tmp);

  expect(await glob(`${tmp}/.claude/agents/*.md`)).toHaveLength(7);
  expect(await glob(`${tmp}/.claude/commands/*.md`)).toHaveLength(3);
  expect(await glob(`${tmp}/.claude/scripts/*.mjs`)).toHaveLength(19);
  // ...
});

Conformance tests #

Every adapter and stack-pack package ships test/conformance.test.ts:

import { test, expect } from 'vitest';
import { runAdapterConformance } from '@aidokit/core';
import { adapter } from '../src/index.js';

test(`${adapter.manifest.name} conforms to declared level`, async () => {
  const report = await runAdapterConformance(adapter, adapter.manifest.conformance);
  if (report.overall !== 'pass') {
    console.error(JSON.stringify(report.results.filter(r => r.status === 'fail'), null, 2));
  }
  expect(report.overall).toBe('pass');
});

Same pattern with runStackPackConformance for packs. The harness is in @aidokit/core; check ids in reference/conformance-reference.md.

For source-level capability cross-check (AD-STD-CAP-01), pass packagePath:

await runAdapterConformance(adapter, 'standard', { packagePath: path.join(__dirname, '..') });

Dogfood byte-compare #

Live test in CI mirrors the local check from how-to-guides/how-to-run-tests-and-byte-compare.md:

diff -r aidokit/.claude  /tmp/dog/.claude

If you touched @aidokit/adapter-claude-code or @aidokit/base-skills, you must keep both sides in sync.

Determinism #

Every test depends on deterministic output. Sources of non-determinism to avoid:

Date.now() / new Date() in emitted content (use injected clock if you need a timestamp anywhere).
crypto.randomUUID() / Math.random() in emitted content.
Set / Map iteration order without an explicit sort step.
File-system enumeration order — always .sort() paths before assertions.

The harness includes adapter.determinism.repeated-invocation-stable and stack-pack.determinism.repeated-invocation-stable. They catch most leaks.

Coverage #

Targets (AGENTS.md §7.4):

@aidokit/core ≥ 90%
Adapters ≥ 80%
@aidokit/cli ≥ 70%

Run pnpm --filter @aidokit/<pkg> test --coverage. Vitest produces coverage/. Lowering a floor needs an ADR.

Fixture conventions #

Synthetic ProjectContext and DetectContext come from @aidokit/core/test-utils (makeProjectContext, makeDetectContext — see CHANGELOG [0.5.0]).
Inline fixtures over separate files when they fit in ~20 lines.
Named fixtures in conformance-levels spec Appendix A: greenfield-node-ts, brownfield-node-ts, multi-stack-fullstack, python-django, go-cli, monorepo-mixed, minimum-conformance, strict-conformance.

Common pitfalls #

fs.promises.readdir order assumed to be alphabetic — it isn't on all platforms. Always sort.
Path separators — use node:path.join. Forward-slash literals work on POSIX but fail on Windows CI.
Async test leak — every await needs to settle inside the test(…) callback; trailing promises trip Vitest's strict mode.
Conformance test running before build — Turborepo orders correctly; if you bypass it, build first.