MVCC and GC Operator Minimum
Minimal operator contract for triaging GC/MVCC behavior.
Goal
Make GC predictable:
- see lag and stalls;
- bound the pause budget;
- understand which knobs to adjust first.
Metrics to watch
- Watermark:
angarabase_gc_watermark_snapshot- Slice latency:
angarabase_gc_compact_slice_duration_ms_*- GC progress:
angarabase_gc_compact_slices_totalangarabase_gc_compact_tables_scanned_totalangarabase_gc_compact_versions_removed_totalangarabase_gc_compact_tables_removed_total- Long snapshot risk:
txn_oldest_snapshot_age_secondstxn_long_snapshot_warn_totaltxn_long_snapshot_hard_total
Core knobs
ANGARABASE_GC_BUDGET_TABLESANGARABASE_GC_BUDGET_MSANGARABASE_GC_BUDGET_VERSIONSANGARABASE_GC_BURST_SLICESANGARABASE_GC_BURST_MAX_MSANGARABASE_GC_CURSOR_FILE(best-effort persisted cursor)
Full settings: src/operations/config-schema.md.
Triage: “GC not keeping up”
- Check
txn_oldest_snapshot_age_seconds: large age limits the watermark by contract. - Check the tail of
gc_compact_slice_duration_ms_*: if it grows, reduce the slice budget. - Check the trend of
*_versions_removed_totaland*_tables_scanned_total: if there is no progress, look for a long snapshot and environment issues through a diagnostics bundle.
UndoStore GC (RM-0.6.5.20)
RM-0.6.5.20 introduced epoch-based UNDO log GC:
How It Works
UndoGcWorkerstarts as a background thread at server startup- Every ~60 seconds (configurable interval),
gc_watermarkis computed for each DB UndoStore::gc_purge_older_than(gc_watermark)removes records older than the watermark- Watermark = committed_epoch minus safety margin (protects active read-only transactions)
Metric
angarabase_undo_purged_records_total — gauge showing UNDO record cleanup progress. Updated when GC is active.
Diagnostics
SELECT * FROM sys.metrics WHERE name LIKE '%undo%';
-- Expected: angarabase_undo_purged_records_total > 0 under write load
Troubleshooting (UNDO GC not working):
If angarabase_undo_purged_records_total stays at 0 for a long time during active UPDATE/DELETE:
- Check
txn_oldest_snapshot_age_seconds— long (stuck) transactions blockgc_watermarkadvancement. - Find and terminate stuck transactions (kill).
- Check server logs for
UndoGcWorkererrors (for example, I/O errors with.audfiles).
Manual heap-file compaction
# one-shot compact for a specific DB:
bash tools/golden_db/manage.sh compact <db_name>
Use after bulk DELETE / many UPDATEs if the .adb file is suspiciously large.
Related runbooks
src/operations/diagnostics-bundle.mdsrc/operations/performance-tuning.md