Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Testing and Validation

Operator baseline for crash/recovery validation and related checks. Canonical source: this runbook in angarabook/src/operations/.

Goal

Verify that the recovery path:

  • does not allow silent corruption;
  • preserves the durability-mode contract;
  • remains idempotent across repeated restarts.

Core invariants

  • No silent corruption: either clean startup or explicit failure with diagnostics.
  • Idempotent recovery: repeated restart does not change the oracle outcome.
  • Visibility safety: uncommitted changes do not become visible after recovery.
  • Durability semantics:
  • strict: ack-commit must survive crash;
  • group_commit: ack semantics strictly match the stated contract;
  • relaxed: some ack-commits may be lost within the contract, but without integrity violations.

Minimal runner

Pinned runner:

tools/storage_poc/crash_loop.sh

Key profiles:

  • --nightly
  • --dirty-pressure
  • --double-restart
  • --durability strict|group_commit|relaxed
  • --corrupt-txlog / --corrupt-storage (fail-fast checks)

Minimum scenarios

  1. SIGKILL during commit storm (tail handling).
  2. SIGKILL around checkpoint markers.
  3. Double restart idempotence.

Required artifacts

  • txlog_scan_*.json
  • txlog_replay_*/*.json
  • recovery_summary.json (+ restart2 summary for idempotence)
  • machine-readable pass/fail summary for CI/nightly.

Exit criteria (operator gate)

  • All required scenarios pass.
  • Artifacts are valid and available for triage.
  • No visibility/durability invariant violations.
  • src/operations/backup-restore.md
  • src/operations/disaster-recovery.md
  • src/operations/diagnostics-bundle.md

Next