Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Transactions and MVCC

AngaraBase ensures concurrent access to data via MVCC (Multi-Version Concurrency Control). Transactions guarantee atomicity of changes, while MVCC allows readers and writers to operate simultaneously without mutually blocking each other.

Transaction Basics

Transaction Management

BEGIN; -- start an explicit transaction
SAVEPOINT sp1; -- create a savepoint
ROLLBACK TO SAVEPOINT sp1; -- rollback to the savepoint
COMMIT; -- commit the transaction
ROLLBACK; -- rollback the entire transaction

Autocommit

By default, AngaraBase operates in autocommit mode: each individual SQL statement executes as a standalone transaction. If the statement completes successfully, the result is committed automatically; on error, it is rolled back.

For operations affecting multiple rows or tables, use explicit transactions (BEGIN / COMMIT) to group changes into a single atomic unit.

MVCC: Row Versioning

The core idea of MVCC is that readers do not block writers, and writers do not block readers. This is achieved by storing multiple versions of each row.

Version Metadata

Each row version contains two system fields:

FieldPurpose
created_commitEpoch (commit timestamp) when the version was created
deleted_commitEpoch when the version was marked as deleted ( for active versions)

Visibility Rule

A row version is visible to a transaction with snapshot S if both conditions are met:

  1. created_commit <= S — the version was created before or at the time of the snapshot
  2. The version is not deleted, or deleted_commit > S — the deletion occurred after the snapshot

Write Operations

  • INSERT — creates a new row version with created_commit = current epoch
  • UPDATE — does not modify the row in-place. Instead, it marks the current version as deleted (deleted_commit = current epoch) and creates a new version with the updated data
  • DELETE — marks the version as deleted (deleted_commit = current epoch)

Isolation Levels

Each transaction receives a snapshot — a fixed view of the data at a specific point in time.

READ COMMITTED (Default)

The snapshot is updated before each statement. The transaction sees all data committed before the start of the current statement. This is the recommended isolation level for most workloads.

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

REPEATABLE READ

The snapshot is fixed at the time of BEGIN and remains unchanged until the end of the transaction. All statements within the transaction see the same state of the data.

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

SERIALIZABLE

As of version 0.6.4.4, AngaraBase implements full SERIALIZABLE (SSI) isolation level. In SERIALIZABLE mode, write skew and phantoms anomalies are prevented through SIREAD locks and tracking read-write anti-dependencies. Transactions violating serializability are aborted with code 40001.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

Locks

Read Operations

Reading uses the MVCC snapshot and does not require locks. A reader never waits for a writer, and vice versa.

Write Operations (Lock-Free DML)

Writers use atomic operations (Compare-And-Swap) for row-level version installation, ensuring lock-free and conflict-free modifications. Traditional write locks are no longer held.

DDL Operations

Schema change operations (CREATE TABLE, ALTER TABLE, DROP TABLE) acquire table-level locks for the duration of the execution.

Deadlock Detection

AngaraBase detects deadlocks using:

  • Timeout — if a transaction waits for a lock longer than a configured threshold, it is aborted
  • Victim selection — upon detecting a cycle, the system chooses a victim transaction to rollback
  • Deterministic lock ordering — an internal strategy for ordering locks to reduce the likelihood of deadlocks

Garbage Collection (AngaraGC)

Over time, the storage accumulates old row versions that are no longer visible to any active transaction. The AngaraGC subsystem is responsible for cleaning them up.

How It Works

  1. GC watermark — calculated as the minimum snapshot among all active transactions: min(active_snapshots)
  2. Row versions with deleted_commit < watermark are safe to delete — no active transaction can see them
  3. Cleanup is performed by a background process without pausing query execution
  4. Bounded slices — GC processes data in fixed-size chunks to avoid latency spikes
  5. Epoch Reaper — a background worker (since version 0.6.5.24) that prevents the GC watermark from stalling due to abruptly disconnected or hung sessions.

Difference from PostgreSQL

AngaraBase does not have autovacuum in the traditional sense. It uses a hybrid design with an epoch-based watermark (similar to Oracle/InnoDB), allowing for more precise control over the cleanup timing.

Recommendations

  • Use READ COMMITTED (the default level) for most workloads
  • Avoid long-running transactions — they hold back the GC watermark and prevent the cleanup of old row versions, which increases disk space consumption
  • When using REPEATABLE READ, be aware of potential write skew. If strict serializability is needed, use explicit locks (SELECT ... FOR UPDATE)

MVCC State Upon Crash Recovery

Upon restarting after a crash, AngaraBase restores the MVCC state from the transaction log (WAL).

What Is Restored

  1. Committed transactions — transactions that managed to write a COMMIT to the WAL
  2. Aborted transactions — incomplete transactions are marked as aborted
  3. MVCC visibility — information about which row versions are visible for each commit epoch
  4. Transaction counters — current commit epoch and other counters

Recovery Process

  1. WAL scan — scanning the transaction log files in chronological order
  2. MVCC replay — restoring in-memory MVCC structures from WAL records
  3. Cleanup — marking uncompleted transactions as aborted

Limitations

  • Backend requirement: MVCC recovery only works with transaction_log.backend = "file_bin"
  • Memory rebuild: The MVCC state is rebuilt in memory, which can take time for large WAL volumes
  • Read-your-writes: Immediately after restart, uncompleted transactions are invisible (marked as aborted)

Monitoring Recovery

-- Check the recovery mode
SELECT recovery_mode FROM sys.identity;

-- Check system health after recovery
SELECT txn_commit_epoch_current FROM sys.health;

Possible recovery_mode values:

  • "normal" — normal start without recovery
  • "crash_recovery" — recovery after a crash
  • "forced_takeover" — forced instance lease takeover