Runbook: BufferPoolPressure
Source of truth:
tools/observability/alerts/angarabase_alerts.yaml. Backed by: RM-0.6.3.8 S7.
What It Means
The buffer pool hit ratio has fallen below 90% (metric angarabase_buffer_pool_hit_ratio_milli < 900)
for 10 minutes. Each page read is increasingly going to disk instead of memory.
Severity
warning. Read performance is degrading; not yet critical.
Initial response
- Grafana Overview v2 → row “Buffer Pool & Memory”.
- Compare
pages_loadedrate withpages_evictedrate — whether there is churn. - Check
angarabase_jemalloc_resident_bytes— whether RSS is growing (hint of a leak).
Diagnostics
# Current buffer-pool capacity and load (RM-0.6.6.3 S6-D2)
curl -sf http://127.0.0.1:9898/metrics | rg "buffer_pool_capacity|buffer_pool_hit|buffer_pool_miss"
# Example output:
# angarabase_buffer_pool_capacity_pages 195797 ← auto-detect 25% AvailRAM (3.0 GiB)
# angarabase_buffer_pool_hit_total 4120000
# angarabase_buffer_pool_miss_total 380000
curl -sf http://127.0.0.1:9898/metrics | rg buffer_pool
curl -sf http://127.0.0.1:9898/metrics | rg jemalloc
# Top tables by reads
psql -c "SELECT relname, heap_blks_read, heap_blks_hit \
FROM pg_statio_user_tables ORDER BY heap_blks_read DESC LIMIT 10;"
Mitigation
-
Auto-sizing (RM-0.6.6.3): starting with this release, the engine automatically determines the buffer-pool size at startup: 25% of
MemAvailablefrom/proc/meminfo, clamped to [1.6 GiB, 32 GiB]. Restarting after freeing memory on the host often solves the problem without changing config.For a forced value:
export ANGARABASE_STORAGE_MAX_CACHED_PAGES=<N>before startup, where N = number of 16 KiB pages (for example, 200000 ≈ 3.1 GiB). -
Working set > RAM: consider partitioning or archiving old data.
-
GC churn: check GCBloatHigh — bloat increases the working set.
-
Memory leak: see jemalloc-profiling.md.
Escalation
If the hit ratio does not recover after a config change / restart, collect a diagnostics bundle.
Related
- Performance tuning guide
- Configuration schema reference
- jemalloc profiling
- Backpressure Coordinator (RM-0.6.3.9 §S5+§S9) —
unified pool/WAL/uncommitted-pages backpressure decisions, including
the
pool_wait_timeout_msknob,angarabase_buffer_pool_over_capacity_pages,angarabase_buffer_pool_evict_failed_total,angarabase_buffer_pool_waiter_wait_secondshistogram, and theBufferPoolError::WaitTimeoutSQL error path (RM-0.6.3.9 §S2+§S8 capacity waiter). - Resource Advisors v0 (RM-0.6.3.9 §S10) —
angarabase_memory_pressure_ratiocorrelates with sustainedBufferPoolPressureevents when working-set growth, not churn, is the cause.