Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Wait Events

AngaraBase 0.6.3.9 §S11 — baseline wait events model. RM-0.6.4.10 adds QoS scheduler events. RM-0.6.4.19 Track C C1 adds per-session counters and per-session query angara_stat_wait_events.

This page describes the WaitEvent taxonomy that AngaraBase uses to classify blocking operations. For operators, wait events answer the question: “what is the cluster waiting on right now?” without strace, manual stack trace analysis, or instrumenting every call site.

Why the wait events model is needed

The model is similar to pg_stat_activity.wait_event in PostgreSQL or sys.dm_os_wait_stats in SQL Server:

  • every blocking code section gets a specific wait reason;
  • current wait is visible at the session/activity level;
  • aggregated metrics provide rate, active count, and latency distribution for each event;
  • dashboards compare different wait classes through a unified label event=<variant_snake_case>.

Two observability layers:

  1. Current session waitangara_stat_activity.wait_event_type and angara_stat_activity.wait_event.
  2. Aggregated Prometheus metrics — counters, gauges, and histograms for each wait event.

Events

WaitEvent is a stable public API. Adding a variant is non-breaking: dashboards will see the new event=... label after upgrade. Deleting, renumbering, or changing the as_str() value is considered a breaking change.

VariantLabelWait typeWhen it fires
RowLockrow_lockLockWaiting for tuple-level lock.
PageLockpage_lockLockWaiting for page-level latch.
TableLocktable_lockLockWaiting for relation-level lock for DDL/lock manager.
TransactionLocktransaction_lockLockWaiting for another transaction to commit/finish.
PredicateLockAcquirepredicate_lock_acquireLockWaiting to acquire predicate lock for SSI foundation.
PredicateConflictCheckpredicate_conflict_checkLockWaiting to check predicate conflict graph.
PageReadpage_readIOReading heap/index page on cache miss.
PageWritepage_writeIOPage write-back during checkpoint or eviction.
WalFlushwal_flushIOWAL flush / fsync path.
FsyncfsyncIOOther fsync paths: catalog, FPI, and related operations.
WalSyncwal_syncIOStrict WAL sync wait in durability path.
WalGroupCommitwal_group_commitIOWaiting for group commit batch.
ColumnarCompactioncolumnar_compactionIOBackground compactor waits for disk I/O or manifest append mutex in compact_l0_to_l1().
ClientReadclient_readNetReading from client socket.
ClientWriteclient_writeNetWriting to client socket.
ReplicaReadreplica_readNetReading from replica connection.
ReplicaWritereplica_writeNetWriting to replica connection.
NetReadnet_readNetGeneric network read.
NetWritenet_writeNetGeneric network write.
CpuRuncpu_runCPUSession is running on CPU; this is not blocking.
PageDecompressionpage_decompressionCPUCPU time for page decompression on buffer-pool miss.
PageCompressionpage_compressionCPUCPU time for dirty page compression before flush.
AdmissionQueueadmission_queueSchedulerWaiting for admission control queue.
IoSchedulerQueueio_scheduler_queueSchedulerWaiting for I/O scheduler queue.
MemoryGrantQueuememory_grant_queueSchedulerWaiting for memory grant.
BufferPoolEvictionbuffer_pool_evictionSchedulerSession waits for a free or evictable buffer-pool slot.
BackpressureThrottlebackpressure_throttleSchedulerUnified backpressure coordinator throttles caller.
DiskRestartHarnessdisk_restart_harnessSchedulerTest harness waits for on-disk state re-hydration in disk-restart test.
QosQueueqos_queueSchedulerAsync task is in per-shard DRR queue of the QoS scheduler before dispatch.
QosBlockingqos_blockingSchedulerBlocking task waits for dispatch through the QoS blocking path.

QoS Events RM-0.6.4.10

qos_queue means the task has already been classified by service level (critical, interactive, background) and is waiting for dispatch in the scheduler queue. Growth in this wait usually indicates scheduler saturation or a load burst.

qos_blocking means the task entered the blocking path of the QoS scheduler. Watch it together with gauges angarabase_qos_blocking_inflight and angarabase_spawn_blocking_max: if blocking wait grows and inflight is close to the limit, cluster pressure is in the runtime/blocking pool, not SQL locks.

In Sprint 2A, service-level granularity is intentionally coarse: there are no separate qos_queue_critical, qos_queue_interactive, qos_queue_background. For service level, use QoS counters: angarabase_qos_queued_*_total and angarabase_qos_rejected_*_total.

Ordinals and compatibility

Ordinals are append-only and pinned in WaitEvent::ordinal():

  • QosQueue has ordinal 28;
  • QosBlocking has ordinal 29.

The WaitEvent::ALL array is used to render all label values in metrics. The fixed metrics array size is defined by N_WAIT_EVENT_VARIANTS.

Compatibility rules:

  • adding a variant — non-breaking;
  • deleting a variant — breaking;
  • renumbering ordinal — breaking;
  • renaming a label value from as_str() — breaking for dashboards and alerts.

Per-session wait events (RM-0.6.4.19 Track C C1)

Starting with RM-0.6.4.19, angara_stat_wait_events supports per-session mode:

-- Process-wide aggregates (as before):
SELECT * FROM angara_stat_wait_events;

-- Per-session counters of the current session:
SELECT * FROM angara_stat_wait_events WHERE session_id = current_session();

In per-session mode:

  • total — total number of entries into this wait event for the current session since it started.
  • active and total_duration_us — always 0 in phase 1 (per-session histogram deferred to phase 2).
  • Counters are incremented via WaitEventGuard::enter and stored in AtomicWaitState::event_counts (per-session registry, indexed by session_id).

If the session has not entered any wait event, all total = 0 (empty wait state returns zeros).

Metrics

For each event, three Prometheus series are exported with label event=<variant_snake_case>:

MetricTypeMeaning
angarabase_wait_events_totalcounterHow many times code entered this wait type.
angarabase_wait_events_activegaugeHow many waits of this type are active right now.
angarabase_wait_event_duration_secondshistogramWait duration distribution.

Histogram buckets in seconds: 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, +Inf.

PromQL Examples

Top-N wait classes by accumulated time over 5 minutes:

topk(
  5,
  rate(angarabase_wait_event_duration_seconds_sum[5m])
)

Active waits right now:

sum by (event) (angarabase_wait_events_active)

p99 latency for buffer-pool eviction:

histogram_quantile(
  0.99,
  rate(angarabase_wait_event_duration_seconds_bucket{event="buffer_pool_eviction"}[5m])
)

Backpressure throttle rate:

rate(angarabase_wait_events_total{event="backpressure_throttle"}[1m])

QoS queue wait rate:

rate(angarabase_wait_events_total{event="qos_queue"}[5m])

p95 waits in QoS queue:

histogram_quantile(
  0.95,
  rate(angarabase_wait_event_duration_seconds_bucket{event="qos_queue"}[5m])
)

Active QoS blocking waits:

angarabase_wait_events_active{event="qos_blocking"}

Alert for long QoS queue:

histogram_quantile(
  0.99,
  rate(angarabase_wait_event_duration_seconds_bucket{event="qos_queue"}[5m])
) > 0.5

Alert for blocking pool pressure:

angarabase_wait_events_active{event="qos_blocking"} > 0
and
angarabase_qos_blocking_inflight > 0

Correlation of QoS waits with rejections:

rate(angarabase_wait_events_total{event="qos_queue"}[5m])
and
sum(rate({__name__=~"angarabase_qos_rejected_.*_total"}[5m])) > 0

Operator playbook

BufferPoolEviction is growing:

BackpressureThrottle is growing:

  • WAL queue or buffer pool exhaustion slows clients;
  • check angarabase_buffer_pool_uncommitted_pages_ratio;
  • correlate with WAL group-commit latency.

WalFlush or WalSync p99 is above 100 ms:

RowLock has high duration:

QosQueue is growing:

  • check angara_stat_qos_queues;
  • watch angarabase_qos_rejected_*_total;
  • reduce batch job concurrency;
  • move heavy jobs to SET service_level = 'background';
  • review ANGARABASE_QOS_WEIGHTS and ANGARABASE_QOS_MAX_QUEUED.

QosBlocking is growing:

  • check angarabase_qos_blocking_inflight;
  • check angarabase_spawn_blocking_max;
  • look for blocking workload that displaces runtime capacity;
  • do not treat this by increasing SQL lock timeout: the wait is in the scheduler/runtime path.

Source of truth

  • Code: crates/angarabase/src/observability/wait_events.rs
  • Per-session dispatch: crates/angarabase/src/virtual_catalog.rs + virtual_catalog/shared_catalog.rs
  • Metrics: crates/angarabase/src/metrics/core.rs
  • Render: crates/angarabase/src/metrics/render.rs
  • QoS scheduler: crates/angarabase/src/qos_manager.rs
  • RM: docs/planning/v0.6/RM-0.6.3.9.md §S11, docs/planning/v0.6/RM-0.6.4.10.md, docs/planning/v0.6/RM-0.6.4.19.md Track C C1