Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Runbook: HighSlowQueryRatio

Source of truth: tools/observability/alerts/angarabase_alerts.yaml. Backed by: RM-0.6.3.8 S7. Renamed from HighErrorRate in G2-FIX cycle 2 (F-S7-1, 2026-04-19) to reflect the semantics accurately.

What It Means

The share of slow queries exceeds 1% of total queries over the last 5 minutes:

rate(angarabase_slow_query_total[5m])
  / clamp_min(rate(angarabase_query_exec_total[5m]), 1)
  > 0.01

Important: this is NOT a true error rate. AngaraBase does not yet split angarabase_query_exec_total into _ok / _err counters (Design Gap DG-1, moved to RM-0.6.6.0). Slow-query ratio is a best-effort proxy for client-perceived degradation. A true HighErrorRate will appear after the counters are split.

Severity

warning. Degradation signal, not an outage.

Initial response

  1. Open Grafana Overview v2 → row “Query Performance” → panel “Slow queries / Total queries ratio”.
  2. Drill down into the Query Store dashboard → top-N slow queries.
  3. Check correlation with BufferPoolPressure, LongTransaction, WALFsyncSlow.

Diagnostics

curl -sf http://127.0.0.1:9898/metrics | rg -E '^angarabase_(slow_query|query_exec)_total'
psql -c "SELECT * FROM angara_stat_statements ORDER BY total_time DESC LIMIT 10;"

Mitigation

SymptomAction
Specific queryEXPLAIN ANALYZE → recreate the index / rewrite the query
runtime_facts.spill_bytes > 0Not enough memory for the operator. See Performance tuning (increase memory limit / work_mem)
seq scan chosen: low cardinality / low selectivityExpected with thresholds in [execution]. First run ANALYZE and check distinct_estimate. Then, if needed, adjust index_cardinality_threshold / index_scan_selectivity_threshold in angarabase.conf (or env before startup) and restart; SET in Simple Query does not apply. See Statistics, Performance tuning
All queries are slowerSee HighP99Latency — check system signals first
Growing after deployRoll back the release; check the query plan
Correlates with GCSee GCBloatHigh

Escalation

If the ratio does not drop for more than 30 minutes → diagnostics bundle + escalation.