Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

MVCC and GC Operator Minimum

Minimal operator contract for triaging GC/MVCC behavior.

Goal

Make GC predictable:

  • see lag and stalls;
  • bound the pause budget;
  • understand which knobs to adjust first.

Metrics to watch

  • Watermark:
  • angarabase_gc_watermark_snapshot
  • Slice latency:
  • angarabase_gc_compact_slice_duration_ms_*
  • GC progress:
  • angarabase_gc_compact_slices_total
  • angarabase_gc_compact_tables_scanned_total
  • angarabase_gc_compact_versions_removed_total
  • angarabase_gc_compact_tables_removed_total
  • Long snapshot risk:
  • txn_oldest_snapshot_age_seconds
  • txn_long_snapshot_warn_total
  • txn_long_snapshot_hard_total

Core knobs

  • ANGARABASE_GC_BUDGET_TABLES
  • ANGARABASE_GC_BUDGET_MS
  • ANGARABASE_GC_BUDGET_VERSIONS
  • ANGARABASE_GC_BURST_SLICES
  • ANGARABASE_GC_BURST_MAX_MS
  • ANGARABASE_GC_CURSOR_FILE (best-effort persisted cursor)

Full settings: src/operations/config-schema.md.

Triage: “GC not keeping up”

  1. Check txn_oldest_snapshot_age_seconds: large age limits the watermark by contract.
  2. Check the tail of gc_compact_slice_duration_ms_*: if it grows, reduce the slice budget.
  3. Check the trend of *_versions_removed_total and *_tables_scanned_total: if there is no progress, look for a long snapshot and environment issues through a diagnostics bundle.

UndoStore GC (RM-0.6.5.20)

RM-0.6.5.20 introduced epoch-based UNDO log GC:

How It Works

  • UndoGcWorker starts as a background thread at server startup
  • Every ~60 seconds (configurable interval), gc_watermark is computed for each DB
  • UndoStore::gc_purge_older_than(gc_watermark) removes records older than the watermark
  • Watermark = committed_epoch minus safety margin (protects active read-only transactions)

Metric

angarabase_undo_purged_records_total — gauge showing UNDO record cleanup progress. Updated when GC is active.

Diagnostics

SELECT * FROM sys.metrics WHERE name LIKE '%undo%';
-- Expected: angarabase_undo_purged_records_total > 0 under write load

Troubleshooting (UNDO GC not working): If angarabase_undo_purged_records_total stays at 0 for a long time during active UPDATE/DELETE:

  1. Check txn_oldest_snapshot_age_seconds — long (stuck) transactions block gc_watermark advancement.
  2. Find and terminate stuck transactions (kill).
  3. Check server logs for UndoGcWorker errors (for example, I/O errors with .aud files).

Manual heap-file compaction

# one-shot compact for a specific DB:
bash tools/golden_db/manage.sh compact <db_name>

Use after bulk DELETE / many UPDATEs if the .adb file is suspiciously large.

  • src/operations/diagnostics-bundle.md
  • src/operations/performance-tuning.md