Runbook: `BufferPoolPressure`

Source of truth: tools/observability/alerts/angarabase_alerts.yaml. Backed by: RM-0.6.3.8 S7.

What It Means

The buffer pool hit ratio has fallen below 90% (metric angarabase_buffer_pool_hit_ratio_milli < 900) for 10 minutes. Each page read is increasingly going to disk instead of memory.

Severity

warning. Read performance is degrading; not yet critical.

Initial response

Grafana Overview v2 → row “Buffer Pool & Memory”.
Compare pages_loaded rate with pages_evicted rate — whether there is churn.
Check angarabase_jemalloc_resident_bytes — whether RSS is growing (hint of a leak).

Diagnostics

# Current buffer-pool capacity and load (RM-0.6.6.3 S6-D2)
curl -sf http://127.0.0.1:9898/metrics | rg "buffer_pool_capacity|buffer_pool_hit|buffer_pool_miss"
# Example output:
#   angarabase_buffer_pool_capacity_pages 195797   ← auto-detect 25% AvailRAM (3.0 GiB)
#   angarabase_buffer_pool_hit_total 4120000
#   angarabase_buffer_pool_miss_total 380000

curl -sf http://127.0.0.1:9898/metrics | rg buffer_pool
curl -sf http://127.0.0.1:9898/metrics | rg jemalloc

# Top tables by reads
psql -c "SELECT relname, heap_blks_read, heap_blks_hit \
         FROM pg_statio_user_tables ORDER BY heap_blks_read DESC LIMIT 10;"

Mitigation

Auto-sizing (RM-0.6.6.3): starting with this release, the engine automatically determines the buffer-pool size at startup: 25% of MemAvailable from /proc/meminfo, clamped to [1.6 GiB, 32 GiB]. Restarting after freeing memory on the host often solves the problem without changing config.

For a forced value: export ANGARABASE_STORAGE_MAX_CACHED_PAGES=<N> before startup, where N = number of 16 KiB pages (for example, 200000 ≈ 3.1 GiB).
Working set > RAM: consider partitioning or archiving old data.
GC churn: check GCBloatHigh — bloat increases the working set.
Memory leak: see jemalloc-profiling.md.

Escalation

If the hit ratio does not recover after a config change / restart, collect a diagnostics bundle.

Performance tuning guide
Configuration schema reference
jemalloc profiling
Backpressure Coordinator (RM-0.6.3.9 §S5+§S9) — unified pool/WAL/uncommitted-pages backpressure decisions, including the pool_wait_timeout_ms knob, angarabase_buffer_pool_over_capacity_pages, angarabase_buffer_pool_evict_failed_total, angarabase_buffer_pool_waiter_wait_seconds histogram, and the BufferPoolError::WaitTimeout SQL error path (RM-0.6.3.9 §S2+§S8 capacity waiter).
Resource Advisors v0 (RM-0.6.3.9 §S10) — angarabase_memory_pressure_ratio correlates with sustained BufferPoolPressure events when working-set growth, not churn, is the cause.

Keyboard shortcuts

AngaraBook