Testing and Validation
Operator baseline for crash/recovery validation and related checks.
Canonical source: this runbook in angarabook/src/operations/.
Goal
Verify that the recovery path:
- does not allow silent corruption;
- preserves the durability-mode contract;
- remains idempotent across repeated restarts.
Core invariants
- No silent corruption: either clean startup or explicit failure with diagnostics.
- Idempotent recovery: repeated restart does not change the oracle outcome.
- Visibility safety: uncommitted changes do not become visible after recovery.
- Durability semantics:
strict: ack-commit must survive crash;group_commit: ack semantics strictly match the stated contract;relaxed: some ack-commits may be lost within the contract, but without integrity violations.
Minimal runner
Pinned runner:
tools/storage_poc/crash_loop.sh
Key profiles:
--nightly--dirty-pressure--double-restart--durability strict|group_commit|relaxed--corrupt-txlog/--corrupt-storage(fail-fast checks)
Minimum scenarios
- SIGKILL during commit storm (tail handling).
- SIGKILL around checkpoint markers.
- Double restart idempotence.
Required artifacts
txlog_scan_*.jsontxlog_replay_*/*.jsonrecovery_summary.json(+ restart2 summary for idempotence)- machine-readable pass/fail summary for CI/nightly.
Exit criteria (operator gate)
- All required scenarios pass.
- Artifacts are valid and available for triage.
- No visibility/durability invariant violations.
Related operations references
src/operations/backup-restore.mdsrc/operations/disaster-recovery.mdsrc/operations/diagnostics-bundle.md
Next
- Golden dataset management — which data validation scenarios run on.
- CI reproducibility contract — reproducibility guarantees for the validation pipeline.
- Operational policies baseline — which policies must have validation coverage.