Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Runbook: BufferPoolPressure

Source of truth: tools/observability/alerts/angarabase_alerts.yaml. Backed by: RM-0.6.3.8 S7.

What It Means

The buffer pool hit ratio has fallen below 90% (metric angarabase_buffer_pool_hit_ratio_milli < 900) for 10 minutes. Each page read is increasingly going to disk instead of memory.

Severity

warning. Read performance is degrading; not yet critical.

Initial response

  1. Grafana Overview v2 → row “Buffer Pool & Memory”.
  2. Compare pages_loaded rate with pages_evicted rate — whether there is churn.
  3. Check angarabase_jemalloc_resident_bytes — whether RSS is growing (hint of a leak).

Diagnostics

# Current buffer-pool capacity and load (RM-0.6.6.3 S6-D2)
curl -sf http://127.0.0.1:9898/metrics | rg "buffer_pool_capacity|buffer_pool_hit|buffer_pool_miss"
# Example output:
#   angarabase_buffer_pool_capacity_pages 195797   ← auto-detect 25% AvailRAM (3.0 GiB)
#   angarabase_buffer_pool_hit_total 4120000
#   angarabase_buffer_pool_miss_total 380000

curl -sf http://127.0.0.1:9898/metrics | rg buffer_pool
curl -sf http://127.0.0.1:9898/metrics | rg jemalloc

# Top tables by reads
psql -c "SELECT relname, heap_blks_read, heap_blks_hit \
         FROM pg_statio_user_tables ORDER BY heap_blks_read DESC LIMIT 10;"

Mitigation

  • Auto-sizing (RM-0.6.6.3): starting with this release, the engine automatically determines the buffer-pool size at startup: 25% of MemAvailable from /proc/meminfo, clamped to [1.6 GiB, 32 GiB]. Restarting after freeing memory on the host often solves the problem without changing config.

    For a forced value: export ANGARABASE_STORAGE_MAX_CACHED_PAGES=<N> before startup, where N = number of 16 KiB pages (for example, 200000 ≈ 3.1 GiB).

  • Working set > RAM: consider partitioning or archiving old data.

  • GC churn: check GCBloatHigh — bloat increases the working set.

  • Memory leak: see jemalloc-profiling.md.

Escalation

If the hit ratio does not recover after a config change / restart, collect a diagnostics bundle.