Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Runbook: LongTransaction

Source of truth: tools/observability/alerts/angarabase_alerts.yaml. Backed by: RM-0.6.3.8 S7, RM-0.6.4.4 (SSI).

What It Means

angarabase_txn_oldest_snapshot_age_seconds > 300 — the oldest open transaction has been alive for more than 5 minutes. This blocks MVCC GC and leads to bloat.

For SERIALIZABLE transactions: holding a transaction for a long time also blocks cleanup (GC) of SIREAD locks and the SSI conflict graph, which can increase false positive aborts (40001) for new transactions because of lock escalation.

Severity

warning. At 30+ minutes it becomes a real GC blocker. For SSI workloads, it is critical for throughput because of aborts.

Initial response

  1. Grafana Overview v2 → row “GC / MVCC”.
  2. Find the transaction PID:
SELECT pid, age(now(), xact_start) AS age, state, query
FROM angara_stat_activity
WHERE state IN ('idle in transaction', 'active')
ORDER BY xact_start ASC LIMIT 5;

Diagnostics

curl -sf http://127.0.0.1:9898/metrics | rg -E 'txn_(oldest_snapshot|active|idle)'
curl -sf http://127.0.0.1:9898/metrics | rg gc_

Mitigation

CauseAction
Client stuck in idle in transactionEnable idle_in_transaction_session_timeout
Long analytical queryMove to a read replica; split into batches
Pgbouncer poolCheck server_idle_timeout, restart the pool
Application bugFix on the client side (transaction scope)

Forced abort (last resort):

SELECT pg_terminate_backend(<pid>);

Escalation

If the transaction is older than 1 hour and blocks GC until bloat > 30%, consider terminate + escalation and document the incident.