Disaster Recovery Playbook
Basic DR playbook for cases where the normal recovery path does not resolve the incident.
Canonical source: this runbook in angarabook/src/operations/.
Scope
Covers the minimal scenarios:
- WAL corruption;
- data directory loss;
- emergency modes with deliberate risk.
1) Corrupted WAL
Symptoms
ChecksumMismatchorInvalidRecordat startup.
Actions
- If corruption is in the tail, expect the normal truncate/recovery path.
- If corruption is in the middle:
- prefer restore from a valid backup (see Backup and restore);
- emergency truncate is allowed only as a last resort with transaction-loss risk.
2) Lost data directory
Actions
- Restore
data_directoryfrom full backup (procedure — Backup and restore). - Verify that WAL contains a contiguous chain after the backup point.
- Run replay and confirm consistency with checks.
3) Emergency modes (high risk)
- Ignoring/weakening integrity checks is allowed only as break-glass.
- Any such startup requires explicit incident evidence and post-incident restoration to normal mode.
4) Prevention baseline
- Regular verified backup/restore rehearsal.
- Atomic data+txlog snapshots when using a snapshot strategy.
- Pinned evidence for recent DR exercises.
Next
- Backup and restore (operator-level) — which preliminary snapshots are required for DR scenarios.
- Upgrade and migration — overlap with DR during cross-version migration.
- Replication v2 operations guide — how DR is built on top of logical replication.
- Troubleshooting guide — if the DR procedure gets stuck in a specific phase.