Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

AngaraReplica v2 Operations Guide

Short operator guide for streaming replication v2. Canonical source: this runbook in angarabook/src/operations/.

Topology and scope

  • 1 primary + up to 8 standby nodes (async replication).
  • Standby works in read-only mode (SQLSTATE 25006 on write).
  • Promote is performed manually (auto-failover in the next major line).

Configuration baseline

Primary:

  • [replication].role = "primary"
  • listen_addr
  • wal_retention_segments

Standby:

  • [replication].role = "standby"
  • primary_addr
  • slot_name
  • wal_path

Operations flow

  1. Start primary.
  2. Start standby and check lag metrics.
  3. Monitor replication lag / reconnects / slots.

Promote (manual failover)

  • Promote must complete through the sync-checkpoint handshake.
  • Promote timeout fails closed (standby does not accept writes if the handshake did not complete).
  • Lease-based fencing reduces split-brain risk, but does not fully replace STONITH/Raft.

Key monitoring signals

  • angara_node_is_standby
  • angara_replication_lag_bytes
  • angara_replication_lag_ms
  • angara_replication_reconnects_total
  • angara_promote_total
  • angara_promote_duration_ms_last

Typical incidents

  • Standby does not connect: address/port/firewall/reconnects.
  • WAL segment gone: base backup and standby restart are required.
  • Promote timeout: check network and WAL write path on primary.

Next