Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Disaster Recovery Playbook

Basic DR playbook for cases where the normal recovery path does not resolve the incident. Canonical source: this runbook in angarabook/src/operations/.

Scope

Covers the minimal scenarios:

  • WAL corruption;
  • data directory loss;
  • emergency modes with deliberate risk.

1) Corrupted WAL

Symptoms

  • ChecksumMismatch or InvalidRecord at startup.

Actions

  1. If corruption is in the tail, expect the normal truncate/recovery path.
  2. If corruption is in the middle:
  • prefer restore from a valid backup (see Backup and restore);
  • emergency truncate is allowed only as a last resort with transaction-loss risk.

2) Lost data directory

Actions

  1. Restore data_directory from full backup (procedure — Backup and restore).
  2. Verify that WAL contains a contiguous chain after the backup point.
  3. Run replay and confirm consistency with checks.

3) Emergency modes (high risk)

  • Ignoring/weakening integrity checks is allowed only as break-glass.
  • Any such startup requires explicit incident evidence and post-incident restoration to normal mode.

4) Prevention baseline

  • Regular verified backup/restore rehearsal.
  • Atomic data+txlog snapshots when using a snapshot strategy.
  • Pinned evidence for recent DR exercises.

Next