Runbook: HighSlowQueryRatio
Source of truth:
tools/observability/alerts/angarabase_alerts.yaml. Backed by: RM-0.6.3.8 S7. Renamed fromHighErrorRatein G2-FIX cycle 2 (F-S7-1, 2026-04-19) to reflect the semantics accurately.
What It Means
The share of slow queries exceeds 1% of total queries over the last 5 minutes:
rate(angarabase_slow_query_total[5m])
/ clamp_min(rate(angarabase_query_exec_total[5m]), 1)
> 0.01
Important: this is NOT a true error rate. AngaraBase does not yet split
angarabase_query_exec_totalinto_ok / _errcounters (Design Gap DG-1, moved to RM-0.6.6.0). Slow-query ratio is a best-effort proxy for client-perceived degradation. A trueHighErrorRatewill appear after the counters are split.
Severity
warning. Degradation signal, not an outage.
Initial response
- Open Grafana Overview v2 → row “Query Performance” → panel “Slow queries / Total queries ratio”.
- Drill down into the Query Store dashboard → top-N slow queries.
- Check correlation with
BufferPoolPressure,LongTransaction,WALFsyncSlow.
Diagnostics
curl -sf http://127.0.0.1:9898/metrics | rg -E '^angarabase_(slow_query|query_exec)_total'
psql -c "SELECT * FROM angara_stat_statements ORDER BY total_time DESC LIMIT 10;"
Mitigation
| Symptom | Action |
|---|---|
| Specific query | EXPLAIN ANALYZE → recreate the index / rewrite the query |
runtime_facts.spill_bytes > 0 | Not enough memory for the operator. See Performance tuning (increase memory limit / work_mem) |
seq scan chosen: low cardinality / low selectivity | Expected with thresholds in [execution]. First run ANALYZE and check distinct_estimate. Then, if needed, adjust index_cardinality_threshold / index_scan_selectivity_threshold in angarabase.conf (or env before startup) and restart; SET in Simple Query does not apply. See Statistics, Performance tuning |
| All queries are slower | See HighP99Latency — check system signals first |
| Growing after deploy | Roll back the release; check the query plan |
| Correlates with GC | See GCBloatHigh |
Escalation
If the ratio does not drop for more than 30 minutes → diagnostics bundle + escalation.
Related
- Performance tuning guide
- HighP99Latency
- DG-1 split
_ok/_errcounters (RM-0.6.6.0 Specs Backlog)