Crash Recovery and Storage Portability
This guide covers crash recovery scenarios and how to safely restart AngaraBase instances on existing data files, including migration to different hosts.
Overview
AngaraBase includes an Instance Lease system that prevents dual-write corruption while enabling safe crash recovery and storage portability. The system works on any filesystem, including NFS and SAN where traditional file locking is unreliable.
What Happens During Crash Recovery
When AngaraBase starts on existing data files, it performs these recovery phases:
Phase A: WAL Recovery (file_bin backend only)
- Scans transaction log files for incomplete entries
- Truncates partial tail records to maintain consistency
- Replays committed page deltas (redo operations)
Phase B: MVCC History Restore
- Recovers in-memory MVCC state from transaction log
- Marks uncommitted transactions as aborted
- Restores visibility information for concurrent reads
Phase C: Instance Lease Check
- Checks for active instance lease in
base.adb - Prevents dual instance startup with fail-closed error
- Automatically takes over expired leases (crash recovery)
Restarting on Existing Files (Same Host)
For normal restart scenarios on the same machine:
- Stop the server (if running):
# Graceful shutdown releases the lease automatically
kill -TERM <angarabase-pid>
- Start normally:
angarabase-server --config angarabase.conf
- Verify recovery:
SELECT recovery_mode, lease_holder_id FROM sys.identity;
The instance lease will be automatically acquired after the TTL expires (default: 30 seconds).
Host Migration (Without dump/restore)
To move data files to a different host:
Prerequisites
- Both hosts have the same AngaraBase version
- Same page size (checked automatically)
- Shared storage OR manual file copy
Step-by-Step Process
- Stop the source instance:
# Graceful shutdown to release lease
kill -TERM <angarabase-pid>
# Verify shutdown completed
ps aux | grep angarabase-server
- Verify lease is released:
# Check that no process holds the lease
# (Optional: use another AngaraBase instance to query sys.identity)
- Copy data files (if not using shared storage):
# Copy entire data directory
rsync -av /old/host/data/ /new/host/data/
# Copy transaction log directory
rsync -av /old/host/txlog/ /new/host/txlog/
- Start on new host:
angarabase-server --config angarabase.conf
- Verify migration:
SELECT lease_holder_hostname, recovery_mode FROM sys.identity;
SELECT COUNT(*) FROM your_tables; -- Verify data integrity
Force Lease Takeover
If the previous instance crashed and the lease hasn’t expired, use force takeover:
When to Use
- Previous instance confirmed dead (host crashed, process killed)
- Lease shows expired time but takeover not automatic
- Emergency recovery scenarios
How to Use
# Set environment variable before starting
export ANGARABASE_FORCE_LEASE_TAKEOVER=1
angarabase-server --config angarabase.conf
Safety Checks
Before forcing takeover, verify:
- Previous instance process is definitely terminated
- No other AngaraBase processes accessing the same files
- Network partitions resolved (if applicable)
Warning: Force takeover with a running instance will cause data corruption.
Diagnostics
Check Lease Status
SELECT
lease_holder_id,
lease_holder_hostname,
lease_expires_at,
lease_acquired_at,
recovery_mode
FROM sys.identity;
Check Recovery Metrics
SELECT * FROM sys.health;
Lease Configuration
Environment variables (set before startup):
ANGARABASE_LEASE_TTL_S: Lease duration in seconds (default: 30)ANGARABASE_LEASE_HEARTBEAT_S: Heartbeat interval (default: 10)ANGARABASE_FORCE_LEASE_TAKEOVER: Force takeover flag (default: false)
Limitations
WAL Backend Requirements
- Full recovery: Requires
transaction_log.backend = "file_bin" - Partial recovery:
noopbackend has no WAL replay (data loss possible)
Filesystem Considerations
- Local filesystems: Full support (ext4, xfs, btrfs, etc.)
- NFS/SAN: Instance lease works; verify file copy consistency
- Network partitions: May cause false lease expiration
Version Compatibility
- Same major.minor version required for host migration
- Page size must match (checked automatically)
- Configuration compatibility recommended
Troubleshooting
“Cannot start: database files are owned by another instance”
- Cause: Active lease held by another instance
- Solution: Wait for lease expiration or use force takeover (if safe)
“MVCC recovery failed”
- Cause: Corrupted transaction log files
- Solution: Check disk space, filesystem errors; may need backup restore
“VERSION decode failed”
- Cause: Corrupted version marker or incompatible format
- Solution: Restore from backup; check filesystem integrity
Performance After Recovery
- First queries may be slower (cold buffer pool)
- MVCC state rebuilds incrementally
- Statistics may need refresh (
ANALYZE TABLE)
See Also
- Instance Lifecycle - Conceptual overview
- Backup and Restore - Host migration alternatives
- Configuration - Lease settings reference