Corpus and Regression Testing
The corpus is the set of inputs that the fuzzer has found useful. Understanding how it works will help you get the most out of Vitiate.
Corpus Locations
Section titled “Corpus Locations”Vitiate loads inputs from two locations. Both use a hash directory name: a base32 encoded hash followed by the test name (e.g., vxr4kpqyb12fza1gv81bjj8k3i64mlqn-parse_url for a test named "parse-url").
Seed Corpus
Section titled “Seed Corpus”.vitiate/testdata/<hashdir>/seeds/For example, a test named "parse does not crash" would have its seed corpus at .vitiate/testdata/vxr4kpqyb12fza1gv81bjj8k3i64mlqn-parse_does_not_crash/seeds/.
Files you create manually to give the fuzzer a starting point. Good seeds exercise different code paths in your target. Crash and timeout artifacts are stored in sibling directories (crashes/ and timeouts/).
The easiest way to discover the directory path is to run npx vitiate init, which creates seed directories for all discovered fuzz tests. You can also list existing directories with ls .vitiate/testdata/ (see the Tutorial for a walkthrough).
Cached Corpus
Section titled “Cached Corpus”.vitiate/corpus/<hashdir>/For example, a test named "parse does not crash" would have its cached corpus at:
.vitiate/corpus/vxr4kpqyb12fza1gv81bjj8k3i64mlqn-parse_does_not_crash/Generated automatically during fuzzing. Each file is named by its SHA-256 hash for deduplication. This directory grows as the fuzzer discovers new coverage and can be deleted safely - the fuzzer will rebuild it. Add .vitiate/corpus/ to your .gitignore.
Two Modes
Section titled “Two Modes”Vitiate’s fuzz() function behaves differently depending on the environment. The same test file serves both modes with no code changes.
Fuzzing Mode
Section titled “Fuzzing Mode”Activated by npx vitiate fuzz or by setting the VITIATE_FUZZ environment variable:
npx vitiate fuzz test/parser.fuzz.tsIn fuzzing mode, each fuzz() call becomes a supervisor that spawns a child Vitest process and enters the fuzz loop. The fuzzer:
- Loads all seed and cached corpus entries
- Evaluates each seed to establish initial coverage
- Mutates corpus entries to generate new inputs
- Saves interesting new inputs to the cached corpus
- Saves crash artifacts to the seed corpus directory
If a crash is found, the test fails with the crash details and artifact path.
Regression Mode
Section titled “Regression Mode”Activated by running your tests normally (the default when VITIATE_FUZZ is not set):
npx vitest runEach fuzz() call:
- Loads all seed and cached corpus entries
- Runs the target once per entry, in order
- Fails the test if any entry throws an unexpected error
This means every crash artifact and every interesting input the fuzzer has ever found is replayed as a test case. Crash artifacts are permanent regression guards - if someone reintroduces the bug, the test fails.
Crash Artifacts
Section titled “Crash Artifacts”When the fuzzer finds a crashing input, it:
- Minimizes the input (removes bytes while preserving the crash)
- Writes the minimized input to
.vitiate/testdata/<hashdir>/crashes/crash-<sha256> - Continues fuzzing for more crashes (configurable via
stopOnCrash)
Commit crash artifacts to version control. They are small, deterministic, and serve as documentation of bugs that were found and fixed.
Timeout artifacts follow the same pattern: .vitiate/testdata/<hashdir>/timeouts/timeout-<sha256>.
Corpus Minimization
Section titled “Corpus Minimization”Over time the cached corpus grows as the fuzzer discovers new coverage. Many of these entries become redundant - later inputs may cover the same edges as earlier ones. Minimization finds the smallest subset that maintains the same total coverage.
Run optimize mode to minimize the cached corpus in place:
npx vitiate optimizeOptimize mode works as follows for each fuzz() test:
- Replays seed corpus entries (
.vitiate/testdata/<hashdir>/seeds/) to establish a baseline of pre-covered edges - Replays cached corpus entries (
.vitiate/corpus/<hashdir>/) and records the edges each one covers - Runs set cover over cached entries only, treating seed edges as already covered - cached entries that are fully redundant with seeds or other survivors are eliminated
- Deletes non-surviving cached entries in place from
.vitiate/corpus/
Seed corpus entries are never deleted - they serve as the coverage baseline. Only cached entries are subject to minimization.
Minimize periodically, especially after long fuzzing sessions. A smaller corpus means faster regression test runs and faster seed evaluation at the start of the next fuzzing session.
For libFuzzer-compatible corpus minimization across arbitrary directories, see the standalone CLI’s merge mode.
Checkpointing Fuzzer Progress
Section titled “Checkpointing Fuzzer Progress”After a long fuzzing session, you can checkpoint the fuzzer’s progress by promoting surviving cached entries to the seed corpus. This makes the coverage gains permanent - they survive even if .vitiate/corpus/ is deleted or the project is cloned fresh.
# 1. Optimize to keep only the minimal covering setnpx vitiate optimize
# 2. Copy surviving cached entries to the seed corpuscp .vitiate/corpus/<hashdir>/* \ .vitiate/testdata/<hashdir>/seeds/
# 3. Commit the new seedsgit add .vitiate/testdata/git commit -m "chore: checkpoint fuzzer corpus"After checkpointing, you can safely delete .vitiate/corpus/ and start the next fuzzing session from the enriched seed corpus.
- Commit seed corpus and crash artifacts. They are small and valuable. Other developers get the fuzzer’s accumulated knowledge when they clone the repository.
- Do not commit the cached corpus. It can be large and is regenerated automatically. Add
.vitiate/corpus/to.gitignore. - Checkpoint after long sessions. Run
npx vitiate optimize, then copy surviving cached entries to the seed corpus and commit them. This preserves coverage gains permanently. - Integrate fuzzing into CI. Run regression tests on every PR and long fuzzing sessions nightly on main. See CI Fuzzing for setup details.