Alert Runbooks
Operator-facing runbooks for each alert rule from
tools/observability/alerts/angarabase_alerts.yaml (RM-0.6.3.8 S7).
Each alert contains annotations.runbook_url with a link to one of
the pages below — this is the binding between the observability surface and the operator
remediation path.
Repo-reproducibility contract (G2-FIX cycle 2 / F-DOC-1): for each
runbook_urlin the alert YAML there is a backing markdown file in this directory. Verifier:python3 - <<'PY' import re, pathlib rules = pathlib.Path("tools/observability/alerts/angarabase_alerts.yaml").read_text() slugs = re.findall(r"runbooks/([a-z0-9-]+)", rules) root = pathlib.Path("angarabook/src/operations/runbooks") missing = [s for s in slugs if not (root / f"{s}.md").exists()] print("OK" if not missing else f"MISSING: {missing}") PY
By Alert Rule
| Alert | Severity | Runbook |
|---|---|---|
AngarabaseDown | critical | angarabase-down.md |
HighP99Latency | warning | high-p99-latency.md |
HighSlowQueryRatio | warning | high-slow-query-ratio.md |
BufferPoolPressure | warning | buffer-pool-pressure.md |
WALFsyncSlow | warning | wal-fsync-slow.md |
DeadlockSpike | critical | deadlock-spike.md |
LongTransaction | warning | long-transaction.md |
GCBloatHigh | warning | gc-bloat-high.md |
ReplicationLag | warning | replication-lag.md |
IndexRoutingLegacyFallback | warning | index-routing-legacy-fallback.md |
URL Convention
The production angarabook deployment maps /operations/runbooks/<slug> →
angarabook/src/operations/runbooks/<slug>.md. If your build
uses a different layout, update runbook_url in the alert YAML
accordingly (the source of truth is the alert file, not the runbooks themselves).
New Runbook Page Template
Each runbook page contains:
- What it means (required) — short explanation of alert semantics + PromQL link.
- Severity — critical / warning / info.
- Initial response (≤ 5 minutes) — what to do right now.
- Diagnostics — concrete commands (
curl,psql,iostat, …). - Mitigation — “symptom → action” table.
- Escalation — when and how to escalate.
- Related — links to adjacent runbooks and reference docs.
Related
- Runbooks index — general catalog of operator runbooks.
- Observability metrics checklist — minimal metric set.