AngaraReplica v2 Operations Guide
Short operator guide for streaming replication v2.
Canonical source: this runbook in angarabook/src/operations/.
Topology and scope
- 1 primary + up to 8 standby nodes (async replication).
- Standby works in read-only mode (
SQLSTATE 25006on write). - Promote is performed manually (auto-failover in the next major line).
Configuration baseline
Primary:
[replication].role = "primary"listen_addrwal_retention_segments
Standby:
[replication].role = "standby"primary_addrslot_namewal_path
Operations flow
- Start primary.
- Start standby and check lag metrics.
- Monitor replication lag / reconnects / slots.
Promote (manual failover)
- Promote must complete through the sync-checkpoint handshake.
- Promote timeout fails closed (standby does not accept writes if the handshake did not complete).
- Lease-based fencing reduces split-brain risk, but does not fully replace STONITH/Raft.
Key monitoring signals
angara_node_is_standbyangara_replication_lag_bytesangara_replication_lag_msangara_replication_reconnects_totalangara_promote_totalangara_promote_duration_ms_last
Typical incidents
- Standby does not connect: address/port/firewall/reconnects.
WAL segment gone: base backup and standby restart are required.- Promote timeout: check network and WAL write path on primary.
Next
- Disaster recovery playbook — DR scenarios on top of replication.
- Backup and restore (operator-level) — how replication complements (does not replace) backup.
- Operational policies baseline — SLA/RTO/RPO agreements within which replication v2.