Backpressure Coordinator
AngaraBase 0.6.3.9 §S5+§S9 — unified backpressure surface.
This page documents how AngaraBase decides to slow down (or refuse) write work when one of its internal queues is at risk of overflowing, and the single Prometheus surface operators should use to investigate it.
Why a coordinator
Up to v0.6.3.8 the storage layer carried three independent backpressure mechanisms:
- The
buffer_pool.uncommitted_pages_ratio_hardthreshold (write transactions blocked until the writeback worker drains the uncommitted-pages set). - The high-priority I/O queue depth (low-priority prefetch dropped when OLTP demand reads saturate the I/O scheduler).
- The buffer pool capacity waiter introduced in §S2+§S8 (writers blocked when no frame can be evicted).
Each mechanism had its own metric and ad-hoc decision logic. There was no single answer to the operator question:
Why is the database refusing my write right now?
Starting with RM-0.6.3.9 the same three mechanisms remain (each as an
isolated BackpressureSource), but they are evaluated through one
BackpressureCoordinator façade and report through one unified
metric family.
Decision model
Each source returns one of three decisions on every coordinator evaluation:
| Decision | Meaning | Typical caller reaction |
|---|---|---|
pass | Source reports no pressure; the request may proceed without delay. | Continue. |
throttle | Source reports elevated pressure; the caller should slow down. | Block on a WaitEventGuard for BackpressureThrottle. |
reject | Source reports critical pressure; the request must be rejected. | Surface a 53400 INSUFFICIENT_RESOURCES error. |
The coordinator’s combined decision uses a strict dominance rule:
reject > throttle > pass
That is, any source reporting reject wins immediately, and any source
reporting throttle wins over a passing source. This mirrors the
fail-fast / block semantics that already existed for the
buffer_pool.backpressure.mode knob (see
runtime_settings.md).
Sources
| Source label | Signal | Tunable knob |
|---|---|---|
uncommitted_pages | Fraction of buffer-pool frames carrying uncommitted page-image deltas. | buffer_pool.uncommitted_pages_ratio_hard (default 0.30). |
wal_queue | High-priority I/O queue depth (OLTP demand reads above the saturation watermark). | (internal, default threshold 4) |
buffer_pool | Buffer-pool capacity waiter (max_cached_pages exhausted, no evictable frame). | buffer_pool.pool_wait_timeout_ms (default 5000). |
The buffer_pool source reports reject when the pool is over capacity
(an eviction failed because every frame is currently pinned) and
throttle when a writer is parked on the capacity cv. The two conditions
are tracked independently of each other.
BREAKING (RM-0.6.3.9 §S5+§S9, decision #5): the
buffer_pool.uncommitted_pages_ratio_hardknob was previously nameduncommitted_dirty_ratio_hard. The legacy identifier is removed without a compatibility alias — operators upgrading from v0.6.3.8 or earlier must rename the key in their config files. The release entry for RM-0.6.3.9 indocs/planning/releases/v0/RELEASE_NOTES.mdcontains the migration note.OPERATOR-UX hardening (2026-04-20, RM-0.6.3.10, closes F-UX-1 + OQ-2026-054 + TD-2026-0175): the parser is now fail-closed on the legacy key. A config that still contains
[buffer_pool] uncommitted_dirty_ratio_hard = …will refuse to start withexit 78(EX_CONFIG) and an operator-facing message naming the renamed key. Unknown keys (typos, future-feature backports) emit a structuredtracing::warn!(target = "config", section, key)and increment the counterangarabase_config_unknown_keys_total— recommended alert:> 0after a fresh deploy. Soft-deprecated aliases ([server] host/port,[storage] wal_directory) remain silently recognized for compatibility.
Metrics
All metrics are emitted on the standard /metrics endpoint
(see observability.md).
| Metric | Type | Labels | Meaning |
|---|---|---|---|
angarabase_backpressure_throttle_decisions_total | counter | source, decision | Per (source × decision) counter incremented on every coordinator evaluation. |
angarabase_backpressure_active_sources | gauge | — | Number of sources currently reporting non-pass (snapshot, refreshed on every evaluation). |
Label sets are stable across releases:
source∈ {uncommitted_pages,wal_queue,buffer_pool}decision∈ {pass,throttle,reject}
PromQL recipes
Detect any active backpressure right now:
angarabase_backpressure_active_sources > 0
Decision rate by source over the last 5 minutes:
sum by (source) (
rate(angarabase_backpressure_throttle_decisions_total{decision!="pass"}[5m])
)
Reject rate (the operator pager-worthy signal):
sum(rate(
angarabase_backpressure_throttle_decisions_total{decision="reject"}[5m]
))
Operator playbooks
| Symptom | First check | Next step |
|---|---|---|
angarabase_backpressure_active_sources >= 1 for >30 s | Which source label dominates the decision counters? | Follow per-source playbook below. |
source="uncommitted_pages",decision="throttle" rate climbing | buffer_pool_uncommitted_dirty_ratio near buffer_pool.uncommitted_pages_ratio_hard? | Increase buffer pool size, or shrink concurrent write batch sizes. |
source="wal_queue",decision="throttle" rate climbing | angarabase_io_advisor_current_batch_size shrinking? (correlated) | Investigate disk saturation; throttle prefetch / background warmup. |
source="buffer_pool",decision="reject" non-zero | angarabase_buffer_pool_over_capacity_pages > 0? | Pinned-page leak is suspected — capture a diagnostics bundle and open an incident. |
Compatibility contract
- The
(source, decision)label sets above are part of the public Prometheus contract and will only change in a major release. - Adding a new source or decision is backward-compatible; removing or
renaming requires a deprecation cycle documented in
CHANGELOG.md. - Coordinator dominance order (
reject > throttle > pass) is part of the contract: alerts may rely on it.