Instance Lifecycle
This document explains the conceptual model of AngaraBase instance identity, lifecycle, and the Instance Lease system that enables safe crash recovery and storage portability.
Instance Identity
Each AngaraBase instance has a unique identity established during initialization:
Core Identity Components
cluster_id: UUID identifying the logical database clusterinstance_id: UUID identifying this specific instance- Data directory: Physical location of database files
- Transaction log directory: Physical location of WAL files
Identity Persistence
Identity is stored in two places:
- VERSION marker: Binary file with format version and IDs
- System catalog pages: In
base.adbreserved pages with full metadata
Instance Lease System
The Instance Lease prevents multiple instances from accessing the same data files simultaneously, which would cause corruption.
Lease Structure
#![allow(unused)]
fn main() {
pub struct InstanceLeaseV0 {
pub holder_id: String, // UUID of owning instance
pub acquired_at_unix_s: u64, // When lease was taken
pub expires_at_unix_s: u64, // When lease expires (TTL)
pub holder_pid: u32, // Process ID (diagnostic)
pub holder_hostname: String, // Hostname (diagnostic)
}
}
Lease State Machine
[None] ──acquire──> [Held] ──heartbeat──> [Held]
↑ │ │
│ │ │
└──expired/release──┘ │
│
[Expired] <──────────timeout─────────────────┘
│
└──takeover──> [Held by new instance]
State Transitions
- None → Held: First instance startup or after graceful shutdown
- Held → Held: Periodic heartbeat updates (every 10s by default)
- Held → None: Graceful shutdown releases lease immediately
- Held → Expired: Heartbeat stops (crash, network partition)
- Expired → Held: New instance takes over after TTL expiration
Lease Storage
- Location: Stored in
SysCatalogMetaV0withinbase.adbpages - Persistence: Atomic updates with full page images
- Reliability: Works on NFS/SAN where
flock()is unreliable
Startup Sequence
Phase 1: Pre-flight Checks
- Verify data directory exists and is initialized
- Check VERSION marker compatibility
- Validate page size matches compiled binary
Phase 2: Lease Acquisition
- Load system catalog from
base.adb - Check existing lease status:
- No lease: Acquire immediately
- Expired lease: Take over with warning
- Active lease: Fail with informative error
- Force takeover: Override active lease (dangerous)
Phase 3: Recovery
- WAL Recovery: Replay transaction log (file_bin backend)
- MVCC Recovery: Restore in-memory transaction state
- Heartbeat Start: Begin periodic lease renewal
Phase 4: Ready for Connections
- Start protocol listeners (pgwire, admin)
- Begin accepting client connections
- Continue heartbeat until shutdown
Recovery Modes
AngaraBase tracks the recovery mode for operational visibility:
Normal Startup
- Clean start on existing, properly shut down data
- No WAL replay required
recovery_mode = "normal"
Crash Recovery
- Previous instance terminated unexpectedly
- WAL replay recovers committed transactions
- MVCC state rebuilt from transaction log
recovery_mode = "crash_recovery"
Forced Takeover
- Operator used
ANGARABASE_FORCE_LEASE_TAKEOVER=1 - May indicate emergency recovery scenario
recovery_mode = "forced_takeover"
Shared Storage Scenarios
The Instance Lease system enables AngaraBase to work correctly on shared storage where multiple hosts can access the same files.
NFS/SAN Deployment
Host A ──┐
├── NFS/SAN ──> [data/] [txlog/]
Host B ──┘ [base.adb with lease]
Benefits
- Failover: Host B can take over if Host A crashes
- Maintenance: Move instance between hosts without dump/restore
- Testing: Run against production data copies safely
Limitations
- Single writer: Only one instance can write at a time
- Network partitions: May cause false lease expiration
- Performance: Network storage latency affects throughput
File Copy Scenarios
For non-shared storage, manual file copy enables:
- Backup testing: Verify backup integrity on different host
- Development: Use production data copy for debugging
- Migration: Move to new hardware without downtime
Configuration
Lease Timing
ANGARABASE_LEASE_TTL_S: How long lease lasts (default: 30s)ANGARABASE_LEASE_HEARTBEAT_S: Renewal frequency (default: 10s)
Safety Controls
ANGARABASE_FORCE_LEASE_TAKEOVER: Emergency override (default: false)
Recommended Settings
# Production: Longer TTL for network stability
export ANGARABASE_LEASE_TTL_S=60
export ANGARABASE_LEASE_HEARTBEAT_S=20
# Development: Shorter TTL for faster iteration
export ANGARABASE_LEASE_TTL_S=15
export ANGARABASE_LEASE_HEARTBEAT_S=5
Monitoring and Observability
Instance Status
-- Check current lease holder
SELECT lease_holder_id, lease_holder_hostname,
lease_expires_at, recovery_mode
FROM sys.identity;
-- Check system health
SELECT uptime_seconds, txn_commit_epoch_current
FROM sys.health;
Lease Events
AngaraBase logs lease events to stderr:
Instance lease acquired: holder=abc123...
Instance lease taken over: holder=def456...
Warning: lease heartbeat failed: I/O error
Instance lease released: holder=abc123...
Metrics Integration
Future versions will expose lease metrics via:
- Prometheus metrics endpoint
sys.metricsvirtual table- Structured logging output
Security Considerations
Access Control
- Lease system does NOT provide authentication
- File system permissions still required
- Network access controls recommended for shared storage
Audit Trail
- Lease changes logged with timestamps
- Instance identity tracked in
sys.identity - Recovery mode visible for forensics
Troubleshooting
Common Issues
“Cannot start: database files are owned by another instance”
- Diagnosis: Active lease prevents startup
- Resolution: Wait for expiration or verify other instance is dead
Frequent lease takeovers
- Diagnosis: Network instability or resource contention
- Resolution: Increase TTL, check network/disk performance
“MVCC recovery failed”
- Diagnosis: Corrupted transaction log
- Resolution: Check filesystem, restore from backup if needed
Debug Information
-- Instance identity and lease
SELECT * FROM sys.identity;
-- Recent recovery statistics
SELECT * FROM sys.health;
-- Transaction log status
SELECT * FROM sys.settings WHERE name LIKE 'transaction_log.%';
Related Sections
Concepts (What to read next)
- Storage Engine — datadir that protects the instance lease.
- Transactions and MVCC — what happens to active transactions during
forced_takeover.
How-to (What to do)
- Crash recovery — operational procedures for recovery after a crash.
- Configuration — variables
instance_lease.*,recovery.*. - Backup and Restore — how to protect datadir and take a snapshot.
Reference
- System views
sys.*—sys.identity,sys.healthfor lease state diagnostics. - Known issues and SQLSTATE —
INSTANCE_*error section.