Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Crash Recovery and Storage Portability

This guide covers crash recovery scenarios and how to safely restart AngaraBase instances on existing data files, including migration to different hosts.

Overview

AngaraBase includes an Instance Lease system that prevents dual-write corruption while enabling safe crash recovery and storage portability. The system works on any filesystem, including NFS and SAN where traditional file locking is unreliable.

What Happens During Crash Recovery

When AngaraBase starts on existing data files, it performs these recovery phases:

Phase A: WAL Recovery (file_bin backend only)

  • Scans transaction log files for incomplete entries
  • Truncates partial tail records to maintain consistency
  • Replays committed page deltas (redo operations)

Phase B: MVCC History Restore

  • Recovers in-memory MVCC state from transaction log
  • Marks uncommitted transactions as aborted
  • Restores visibility information for concurrent reads

Phase C: Instance Lease Check

  • Checks for active instance lease in base.adb
  • Prevents dual instance startup with fail-closed error
  • Automatically takes over expired leases (crash recovery)

Restarting on Existing Files (Same Host)

For normal restart scenarios on the same machine:

  1. Stop the server (if running):
# Graceful shutdown releases the lease automatically
kill -TERM <angarabase-pid>
  1. Start normally:
angarabase-server --config angarabase.conf
  1. Verify recovery:
SELECT recovery_mode, lease_holder_id FROM sys.identity;

The instance lease will be automatically acquired after the TTL expires (default: 30 seconds).

Host Migration (Without dump/restore)

To move data files to a different host:

Prerequisites

  • Both hosts have the same AngaraBase version
  • Same page size (checked automatically)
  • Shared storage OR manual file copy

Step-by-Step Process

  1. Stop the source instance:
# Graceful shutdown to release lease
kill -TERM <angarabase-pid>

# Verify shutdown completed
ps aux | grep angarabase-server
  1. Verify lease is released:
# Check that no process holds the lease
# (Optional: use another AngaraBase instance to query sys.identity)
  1. Copy data files (if not using shared storage):
# Copy entire data directory
rsync -av /old/host/data/ /new/host/data/

# Copy transaction log directory 
rsync -av /old/host/txlog/ /new/host/txlog/
  1. Start on new host:
angarabase-server --config angarabase.conf
  1. Verify migration:
SELECT lease_holder_hostname, recovery_mode FROM sys.identity;
SELECT COUNT(*) FROM your_tables; -- Verify data integrity

Force Lease Takeover

If the previous instance crashed and the lease hasn’t expired, use force takeover:

When to Use

  • Previous instance confirmed dead (host crashed, process killed)
  • Lease shows expired time but takeover not automatic
  • Emergency recovery scenarios

How to Use

# Set environment variable before starting
export ANGARABASE_FORCE_LEASE_TAKEOVER=1
angarabase-server --config angarabase.conf

Safety Checks

Before forcing takeover, verify:

  • Previous instance process is definitely terminated
  • No other AngaraBase processes accessing the same files
  • Network partitions resolved (if applicable)

Warning: Force takeover with a running instance will cause data corruption.

Diagnostics

Check Lease Status

SELECT 
 lease_holder_id,
 lease_holder_hostname,
 lease_expires_at,
 lease_acquired_at,
 recovery_mode
FROM sys.identity;

Check Recovery Metrics

SELECT * FROM sys.health;

Lease Configuration

Environment variables (set before startup):

  • ANGARABASE_LEASE_TTL_S: Lease duration in seconds (default: 30)
  • ANGARABASE_LEASE_HEARTBEAT_S: Heartbeat interval (default: 10)
  • ANGARABASE_FORCE_LEASE_TAKEOVER: Force takeover flag (default: false)

Limitations

WAL Backend Requirements

  • Full recovery: Requires transaction_log.backend = "file_bin"
  • Partial recovery: noop backend has no WAL replay (data loss possible)

Filesystem Considerations

  • Local filesystems: Full support (ext4, xfs, btrfs, etc.)
  • NFS/SAN: Instance lease works; verify file copy consistency
  • Network partitions: May cause false lease expiration

Version Compatibility

  • Same major.minor version required for host migration
  • Page size must match (checked automatically)
  • Configuration compatibility recommended

Troubleshooting

“Cannot start: database files are owned by another instance”

  • Cause: Active lease held by another instance
  • Solution: Wait for lease expiration or use force takeover (if safe)

“MVCC recovery failed”

  • Cause: Corrupted transaction log files
  • Solution: Check disk space, filesystem errors; may need backup restore

“VERSION decode failed”

  • Cause: Corrupted version marker or incompatible format
  • Solution: Restore from backup; check filesystem integrity

Performance After Recovery

  • First queries may be slower (cold buffer pool)
  • MVCC state rebuilds incrementally
  • Statistics may need refresh (ANALYZE TABLE)

See Also