Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Backpressure Coordinator

AngaraBase 0.6.3.9 §S5+§S9 — unified backpressure surface.

This page documents how AngaraBase decides to slow down (or refuse) write work when one of its internal queues is at risk of overflowing, and the single Prometheus surface operators should use to investigate it.

Why a coordinator

Up to v0.6.3.8 the storage layer carried three independent backpressure mechanisms:

  1. The buffer_pool.uncommitted_pages_ratio_hard threshold (write transactions blocked until the writeback worker drains the uncommitted-pages set).
  2. The high-priority I/O queue depth (low-priority prefetch dropped when OLTP demand reads saturate the I/O scheduler).
  3. The buffer pool capacity waiter introduced in §S2+§S8 (writers blocked when no frame can be evicted).

Each mechanism had its own metric and ad-hoc decision logic. There was no single answer to the operator question:

Why is the database refusing my write right now?

Starting with RM-0.6.3.9 the same three mechanisms remain (each as an isolated BackpressureSource), but they are evaluated through one BackpressureCoordinator façade and report through one unified metric family.

Decision model

Each source returns one of three decisions on every coordinator evaluation:

DecisionMeaningTypical caller reaction
passSource reports no pressure; the request may proceed without delay.Continue.
throttleSource reports elevated pressure; the caller should slow down.Block on a WaitEventGuard for BackpressureThrottle.
rejectSource reports critical pressure; the request must be rejected.Surface a 53400 INSUFFICIENT_RESOURCES error.

The coordinator’s combined decision uses a strict dominance rule:

reject  >  throttle  >  pass

That is, any source reporting reject wins immediately, and any source reporting throttle wins over a passing source. This mirrors the fail-fast / block semantics that already existed for the buffer_pool.backpressure.mode knob (see runtime_settings.md).

Sources

Source labelSignalTunable knob
uncommitted_pagesFraction of buffer-pool frames carrying uncommitted page-image deltas.buffer_pool.uncommitted_pages_ratio_hard (default 0.30).
wal_queueHigh-priority I/O queue depth (OLTP demand reads above the saturation watermark).(internal, default threshold 4)
buffer_poolBuffer-pool capacity waiter (max_cached_pages exhausted, no evictable frame).buffer_pool.pool_wait_timeout_ms (default 5000).

The buffer_pool source reports reject when the pool is over capacity (an eviction failed because every frame is currently pinned) and throttle when a writer is parked on the capacity cv. The two conditions are tracked independently of each other.

BREAKING (RM-0.6.3.9 §S5+§S9, decision #5): the buffer_pool.uncommitted_pages_ratio_hard knob was previously named uncommitted_dirty_ratio_hard. The legacy identifier is removed without a compatibility alias — operators upgrading from v0.6.3.8 or earlier must rename the key in their config files. The release entry for RM-0.6.3.9 in docs/planning/releases/v0/RELEASE_NOTES.md contains the migration note.

OPERATOR-UX hardening (2026-04-20, RM-0.6.3.10, closes F-UX-1 + OQ-2026-054 + TD-2026-0175): the parser is now fail-closed on the legacy key. A config that still contains [buffer_pool] uncommitted_dirty_ratio_hard = … will refuse to start with exit 78 (EX_CONFIG) and an operator-facing message naming the renamed key. Unknown keys (typos, future-feature backports) emit a structured tracing::warn!(target = "config", section, key) and increment the counter angarabase_config_unknown_keys_total — recommended alert: > 0 after a fresh deploy. Soft-deprecated aliases ([server] host/port, [storage] wal_directory) remain silently recognized for compatibility.

Metrics

All metrics are emitted on the standard /metrics endpoint (see observability.md).

MetricTypeLabelsMeaning
angarabase_backpressure_throttle_decisions_totalcountersource, decisionPer (source × decision) counter incremented on every coordinator evaluation.
angarabase_backpressure_active_sourcesgaugeNumber of sources currently reporting non-pass (snapshot, refreshed on every evaluation).

Label sets are stable across releases:

  • source ∈ {uncommitted_pages, wal_queue, buffer_pool}
  • decision ∈ {pass, throttle, reject}

PromQL recipes

Detect any active backpressure right now:

angarabase_backpressure_active_sources > 0

Decision rate by source over the last 5 minutes:

sum by (source) (
  rate(angarabase_backpressure_throttle_decisions_total{decision!="pass"}[5m])
)

Reject rate (the operator pager-worthy signal):

sum(rate(
  angarabase_backpressure_throttle_decisions_total{decision="reject"}[5m]
))

Operator playbooks

SymptomFirst checkNext step
angarabase_backpressure_active_sources >= 1 for >30 sWhich source label dominates the decision counters?Follow per-source playbook below.
source="uncommitted_pages",decision="throttle" rate climbingbuffer_pool_uncommitted_dirty_ratio near buffer_pool.uncommitted_pages_ratio_hard?Increase buffer pool size, or shrink concurrent write batch sizes.
source="wal_queue",decision="throttle" rate climbingangarabase_io_advisor_current_batch_size shrinking? (correlated)Investigate disk saturation; throttle prefetch / background warmup.
source="buffer_pool",decision="reject" non-zeroangarabase_buffer_pool_over_capacity_pages > 0?Pinned-page leak is suspected — capture a diagnostics bundle and open an incident.

Compatibility contract

  • The (source, decision) label sets above are part of the public Prometheus contract and will only change in a major release.
  • Adding a new source or decision is backward-compatible; removing or renaming requires a deprecation cycle documented in CHANGELOG.md.
  • Coordinator dominance order (reject > throttle > pass) is part of the contract: alerts may rely on it.