Wait Events
AngaraBase 0.6.3.9 §S11 — baseline wait events model. RM-0.6.4.10 adds QoS scheduler events. RM-0.6.4.19 Track C C1 adds per-session counters and per-session query
angara_stat_wait_events.
This page describes the WaitEvent taxonomy that AngaraBase uses to
classify blocking operations. For operators, wait events answer the question:
“what is the cluster waiting on right now?” without strace, manual stack trace analysis, or
instrumenting every call site.
Why the wait events model is needed
The model is similar to pg_stat_activity.wait_event in PostgreSQL or
sys.dm_os_wait_stats in SQL Server:
- every blocking code section gets a specific wait reason;
- current wait is visible at the session/activity level;
- aggregated metrics provide rate, active count, and latency distribution for each event;
- dashboards compare different wait classes through a unified label
event=<variant_snake_case>.
Two observability layers:
- Current session wait —
angara_stat_activity.wait_event_typeandangara_stat_activity.wait_event. - Aggregated Prometheus metrics — counters, gauges, and histograms for each wait event.
Events
WaitEvent is a stable public API. Adding a variant is non-breaking:
dashboards will see the new event=... label after upgrade. Deleting, renumbering,
or changing the as_str() value is considered a breaking change.
| Variant | Label | Wait type | When it fires |
|---|---|---|---|
RowLock | row_lock | Lock | Waiting for tuple-level lock. |
PageLock | page_lock | Lock | Waiting for page-level latch. |
TableLock | table_lock | Lock | Waiting for relation-level lock for DDL/lock manager. |
TransactionLock | transaction_lock | Lock | Waiting for another transaction to commit/finish. |
PredicateLockAcquire | predicate_lock_acquire | Lock | Waiting to acquire predicate lock for SSI foundation. |
PredicateConflictCheck | predicate_conflict_check | Lock | Waiting to check predicate conflict graph. |
PageRead | page_read | IO | Reading heap/index page on cache miss. |
PageWrite | page_write | IO | Page write-back during checkpoint or eviction. |
WalFlush | wal_flush | IO | WAL flush / fsync path. |
Fsync | fsync | IO | Other fsync paths: catalog, FPI, and related operations. |
WalSync | wal_sync | IO | Strict WAL sync wait in durability path. |
WalGroupCommit | wal_group_commit | IO | Waiting for group commit batch. |
ColumnarCompaction | columnar_compaction | IO | Background compactor waits for disk I/O or manifest append mutex in compact_l0_to_l1(). |
ClientRead | client_read | Net | Reading from client socket. |
ClientWrite | client_write | Net | Writing to client socket. |
ReplicaRead | replica_read | Net | Reading from replica connection. |
ReplicaWrite | replica_write | Net | Writing to replica connection. |
NetRead | net_read | Net | Generic network read. |
NetWrite | net_write | Net | Generic network write. |
CpuRun | cpu_run | CPU | Session is running on CPU; this is not blocking. |
PageDecompression | page_decompression | CPU | CPU time for page decompression on buffer-pool miss. |
PageCompression | page_compression | CPU | CPU time for dirty page compression before flush. |
AdmissionQueue | admission_queue | Scheduler | Waiting for admission control queue. |
IoSchedulerQueue | io_scheduler_queue | Scheduler | Waiting for I/O scheduler queue. |
MemoryGrantQueue | memory_grant_queue | Scheduler | Waiting for memory grant. |
BufferPoolEviction | buffer_pool_eviction | Scheduler | Session waits for a free or evictable buffer-pool slot. |
BackpressureThrottle | backpressure_throttle | Scheduler | Unified backpressure coordinator throttles caller. |
DiskRestartHarness | disk_restart_harness | Scheduler | Test harness waits for on-disk state re-hydration in disk-restart test. |
QosQueue | qos_queue | Scheduler | Async task is in per-shard DRR queue of the QoS scheduler before dispatch. |
QosBlocking | qos_blocking | Scheduler | Blocking task waits for dispatch through the QoS blocking path. |
QoS Events RM-0.6.4.10
qos_queue means the task has already been classified by service level
(critical, interactive, background) and is waiting for dispatch in the scheduler queue.
Growth in this wait usually indicates scheduler saturation or a load burst.
qos_blocking means the task entered the blocking path of the QoS scheduler.
Watch it together with gauges angarabase_qos_blocking_inflight and
angarabase_spawn_blocking_max: if blocking wait grows and inflight is close to the
limit, cluster pressure is in the runtime/blocking pool, not SQL locks.
In Sprint 2A, service-level granularity is intentionally coarse: there are no separate
qos_queue_critical, qos_queue_interactive, qos_queue_background.
For service level, use QoS counters:
angarabase_qos_queued_*_total and angarabase_qos_rejected_*_total.
Ordinals and compatibility
Ordinals are append-only and pinned in WaitEvent::ordinal():
QosQueuehas ordinal28;QosBlockinghas ordinal29.
The WaitEvent::ALL array is used to render all label values in metrics.
The fixed metrics array size is defined by N_WAIT_EVENT_VARIANTS.
Compatibility rules:
- adding a variant — non-breaking;
- deleting a variant — breaking;
- renumbering ordinal — breaking;
- renaming a label value from
as_str()— breaking for dashboards and alerts.
Per-session wait events (RM-0.6.4.19 Track C C1)
Starting with RM-0.6.4.19, angara_stat_wait_events supports per-session mode:
-- Process-wide aggregates (as before):
SELECT * FROM angara_stat_wait_events;
-- Per-session counters of the current session:
SELECT * FROM angara_stat_wait_events WHERE session_id = current_session();
In per-session mode:
total— total number of entries into this wait event for the current session since it started.activeandtotal_duration_us— always0in phase 1 (per-session histogram deferred to phase 2).- Counters are incremented via
WaitEventGuard::enterand stored inAtomicWaitState::event_counts(per-session registry, indexed by session_id).
If the session has not entered any wait event, all total = 0 (empty wait state returns zeros).
Metrics
For each event, three Prometheus series are exported with label
event=<variant_snake_case>:
| Metric | Type | Meaning |
|---|---|---|
angarabase_wait_events_total | counter | How many times code entered this wait type. |
angarabase_wait_events_active | gauge | How many waits of this type are active right now. |
angarabase_wait_event_duration_seconds | histogram | Wait duration distribution. |
Histogram buckets in seconds: 0.001, 0.005, 0.01, 0.05, 0.1,
0.5, 1, 5, +Inf.
PromQL Examples
Top-N wait classes by accumulated time over 5 minutes:
topk(
5,
rate(angarabase_wait_event_duration_seconds_sum[5m])
)
Active waits right now:
sum by (event) (angarabase_wait_events_active)
p99 latency for buffer-pool eviction:
histogram_quantile(
0.99,
rate(angarabase_wait_event_duration_seconds_bucket{event="buffer_pool_eviction"}[5m])
)
Backpressure throttle rate:
rate(angarabase_wait_events_total{event="backpressure_throttle"}[1m])
QoS queue wait rate:
rate(angarabase_wait_events_total{event="qos_queue"}[5m])
p95 waits in QoS queue:
histogram_quantile(
0.95,
rate(angarabase_wait_event_duration_seconds_bucket{event="qos_queue"}[5m])
)
Active QoS blocking waits:
angarabase_wait_events_active{event="qos_blocking"}
Alert for long QoS queue:
histogram_quantile(
0.99,
rate(angarabase_wait_event_duration_seconds_bucket{event="qos_queue"}[5m])
) > 0.5
Alert for blocking pool pressure:
angarabase_wait_events_active{event="qos_blocking"} > 0
and
angarabase_qos_blocking_inflight > 0
Correlation of QoS waits with rejections:
rate(angarabase_wait_events_total{event="qos_queue"}[5m])
and
sum(rate({__name__=~"angarabase_qos_rejected_.*_total"}[5m])) > 0
Operator playbook
BufferPoolEviction is growing:
- buffer pool is smaller than the working set;
max_cached_pageshas been reached;- check the buffer-pool-pressure runbook.
BackpressureThrottle is growing:
- WAL queue or buffer pool exhaustion slows clients;
- check
angarabase_buffer_pool_uncommitted_pages_ratio; - correlate with WAL group-commit latency.
WalFlush or WalSync p99 is above 100 ms:
- fsync regression or storage stall is likely;
- use the wal-fsync-slow runbook.
RowLock has high duration:
- look for lock contention and long transactions;
- use the deadlock-spike runbook.
QosQueue is growing:
- check
angara_stat_qos_queues; - watch
angarabase_qos_rejected_*_total; - reduce batch job concurrency;
- move heavy jobs to
SET service_level = 'background'; - review
ANGARABASE_QOS_WEIGHTSandANGARABASE_QOS_MAX_QUEUED.
QosBlocking is growing:
- check
angarabase_qos_blocking_inflight; - check
angarabase_spawn_blocking_max; - look for blocking workload that displaces runtime capacity;
- do not treat this by increasing SQL lock timeout: the wait is in the scheduler/runtime path.
Source of truth
- Code:
crates/angarabase/src/observability/wait_events.rs - Per-session dispatch:
crates/angarabase/src/virtual_catalog.rs+virtual_catalog/shared_catalog.rs - Metrics:
crates/angarabase/src/metrics/core.rs - Render:
crates/angarabase/src/metrics/render.rs - QoS scheduler:
crates/angarabase/src/qos_manager.rs - RM:
docs/planning/v0.6/RM-0.6.3.9.md§S11,docs/planning/v0.6/RM-0.6.4.10.md,docs/planning/v0.6/RM-0.6.4.19.mdTrack C C1