Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

AngaraReplica v2 Operations Guide

Краткий операторский guide для streaming replication v2. Каноничный источник: этот runbook в angarabook/src/operations/.

Topology and scope

  • 1 primary + до 8 standby (async replication).
  • Standby работает в read-only режиме (SQLSTATE 25006 на write).
  • Promote выполняется вручную (auto-failover в следующей major line).

Configuration baseline

Primary:

  • [replication].role = "primary"
  • listen_addr
  • wal_retention_segments

Standby:

  • [replication].role = "standby"
  • primary_addr
  • slot_name
  • wal_path

Operations flow

  1. Запуск primary.
  2. Запуск standby и проверка lag-метрик.
  3. Мониторинг replication lag / reconnects / slots.

Promote (manual failover)

  • Promote должен завершиться через sync-checkpoint handshake.
  • Таймаут promote fail-closed (standby не принимает writes, если handshake не завершился).
  • Lease-based fencing снижает риск split-brain, но не заменяет полноценно STONITH/Raft.

Key monitoring signals

  • angara_node_is_standby
  • angara_replication_lag_bytes
  • angara_replication_lag_ms
  • angara_replication_reconnects_total
  • angara_promote_total
  • angara_promote_duration_ms_last

Typical incidents

  • Standby не подключается: адрес/порт/firewall/reconnects.
  • WAL segment gone: нужен base backup и restart standby.
  • Promote timeout: проверить сеть и WAL write path на primary.

Дальше

Security Context Propagation

Starting with RM-0.6.7.0, the security context (including tenant_id) is automatically propagated through the WAL replication stream.

Key Features

  • Tenant Isolation: The tenant_id is embedded in WAL records, ensuring that standby nodes maintain the same multi-tenancy boundaries as the primary.
  • Integrity Verification: Replication tokens are protected by CRC32C checksums.
  • Fail-Closed Security: If a tampered or invalid token is detected during replication, the connection is immediately terminated to prevent unauthorized data access.