Runbook: GCBloatHigh
Source of truth:
tools/observability/alerts/angarabase_alerts.yaml. Backed by: RM-0.6.3.8 S7.
What It Means
angarabase_gc_tuning_bloat_ratio_percent > 50 — for each “live” version there is
more than one “dead” version (which AngaraGC cannot remove). Most often this is a symptom
of a blocking long transaction (see LongTransaction).
Severity
warning. At 80%+ bloat, the buffer pool hit ratio drops.
Initial response
- Grafana Overview v2 → row “GC / MVCC”.
- Check the
LongTransactionalert — the root cause is usually there. - Check
gc_tuning_state— whether auto-tuning is reacting by itself.
Diagnostics
curl -sf http://127.0.0.1:9898/metrics | rg gc_
curl -sf http://127.0.0.1:9898/metrics | rg mvcc_
# Top tables by bloat
psql -c "SELECT schemaname, relname, n_dead_tup, n_live_tup,
round(100.0 * n_dead_tup / NULLIF(n_live_tup,0), 2) AS bloat_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY bloat_pct DESC NULLS LAST LIMIT 10;"
Mitigation
- Close long transactions — see LongTransaction.
- Run vacuum on hot tables.
- Tune GC — increase auto-tuning aggressiveness (see mvcc-gc.md §Knobs).
- Full rebuild (downtime) if bloat > 70% and vacuum does not help.
Escalation
If bloat > 70% and does not fall after vacuum + closing long txns, collect diagnostics and escalate (service downtime may be needed).