Index Durability
Overview
AngaraBase ensures that all indexes, including PRIMARY KEY and secondary BTree indexes, are persistent and survive system restarts or crashes. Indexes are managed by the IndexStore component, which handles the allocation of unique table identifiers (index_table_id) and coordinates with the Checkpoint worker to flush index pages to persistent storage.
There is one deliberate exception: indexes over volatile InMemory tables (storage='memory', durability='none'),
including CREATE TEMP TABLE, are catalog-visible for the session/query planner but do not allocate a persistent
index_table_id. They are rebuilt from the live in-memory heap when needed and disappear with the table.
Durability pipeline (PRIMARY KEY)
The durability of PRIMARY KEY indexes is guaranteed through a multi-stage process that integrates with the system catalog and the checkpointing mechanism:
- Table Creation: When a
CREATE TABLE ... PRIMARY KEYstatement is executed, the DDL executor allocates a uniqueindex_table_idusing the storage engine. - Catalog Persistence: The Primary Key definition is saved in the
SysCatalogwith its assignedindex_table_id. This ensures that the mapping between the table and its index is persistent. - Checkpoint Integration: The Checkpoint worker periodically calls
flush_all_indexes(). This operation forces all dirty index pages from memory to disk. - Recovery and Restoration: Upon system restart, the recovery process reads the index definitions from the catalog and restores the index state from the persisted pages on disk.
- Startup Backfill (Legacy Migration): For legacy databases where PRIMARY KEY indexes were created without an
index_table_id, AngaraBase performs an asynchronous background backfill after the database begins accepting connections (to avoid blocking startup). This process identifies legacy PKs, allocates the missingindex_table_id, updates the catalog, and persists the index pages.
Durability pipeline (secondary indexes)
Secondary indexes follow a “durable-by-default” path during creation and maintenance:
- Immediate Persistence: During
CREATE INDEX, the system immediately forces a synchronous flush of index pages to disk after the index build is complete. This ensures that the index is durable even if a crash occurs before the next checkpoint. - Periodic Flushing: Similar to Primary Keys, secondary indexes are included in the periodic
flush_all_indexes()calls by the Checkpoint worker.
Volatile InMemory indexes
InMemory tables with durability='none' use a volatile index path. This includes temporary tables, because
CREATE TEMP TABLE is forced to memory + durability=none even if the statement includes a durable storage hint.
Behavior:
CREATE INDEXsucceeds on volatile InMemory tables.- The index definition is visible in the catalog while the table exists.
index_table_idis intentionally absent, because there are no persistent index pages to restore.- The executor can still use a B-Tree/index-scan path by building an ephemeral index from current in-memory rows.
- On session disconnect or temp table cleanup, the table and its volatile index metadata are removed together.
Example:
SET search_path = public;
CREATE TEMP TABLE tt_items (
id INT,
code mvarchar(16)
);
INSERT INTO tt_items (id, code) VALUES (1, 'AbC ');
CREATE INDEX idx_tt_items_code ON tt_items (code);
-- Uses mvarchar equality semantics and can be served through the volatile index path.
SELECT id FROM tt_items WHERE code = 'abc';
This design keeps temp-table workloads fast and avoids WAL/checkpoint pressure, while preserving the same
comparison contract as durable B-Tree indexes. In particular, mvarchar index keys are normalized with the same
case-insensitive and trailing-space-insensitive rules used by expression evaluation.
Checkpoint fail-closed semantics
AngaraBase employs a fail-closed approach to index durability during the checkpoint process. The flush_all_indexes() operation returns a success status:
- Success: If all indexes are successfully flushed, the checkpoint continues, and the
end_checkpointrecord is written to the WAL. - Failure: If
flush_all_indexes()returnsfalse(indicating a flush error), the checkpoint is aborted. Theend_checkpointrecord is not written.
In the event of an aborted checkpoint, the next system startup will trigger a WAL replay starting from the last successfully completed checkpoint, ensuring that no index data is lost or left in an inconsistent state. The metric angarabase_checkpoint_index_flush_errors_total tracks these occurrences.
Monitoring
You can monitor the status of index durability and backfill operations using the following Prometheus metrics:
| Metric | Type | Description |
|---|---|---|
angarabase_pkey_backfill_in_progress | Gauge | 1 while startup PK backfill is running |
angarabase_pkey_backfill_ok_total | Counter | PK indexes successfully backfilled with TableId |
angarabase_pkey_backfill_fail_total | Counter | PK backfill failures (allocation or persist error) |
angarabase_checkpoint_index_flush_errors_total | Counter | Checkpoint aborted due to index flush failure |
angarabase_index_pkey_no_table_id_total | Gauge | Legacy PK indexes found at startup (before backfill) |
angarabase_index_restore_empty_total | Counter | Indexes that restored empty after restart (Alert if > 0) |
angarabase_wal_lsn_drift_resets_total | Counter | WAL VLF LSN drift resets after crash (Alert if > 0) |
angarabase_index_stale_tuple_fallbacks_total | Counter | UPDATE fallbacks to seq-scan due to stale index tuple |
Alerting Guidance & PromQL
- Empty Indexes:
rate(angarabase_index_restore_empty_total[5m]) > 0- Threshold: > 0 is critical. Indicates data loss or corruption in index persistence.
- WAL Drift:
rate(angarabase_wal_lsn_drift_resets_total[5m]) > 0- Threshold: > 0 is critical. Indicates WAL corruption or crash-safety failure.
- Stale Tuples:
rate(angarabase_index_stale_tuple_fallbacks_total[5m]) > 0.1- Threshold: Occasional fallbacks are normal, but a high rate indicates index corruption or MVCC issues.
Troubleshooting
“After restart the index is empty (entries=0)”
If an index appears empty after a restart despite having data previously (or if angarabase_index_restore_empty_total > 0), check the following:
- Legacy PKs: Check
angarabase_index_pkey_no_table_id_total. If it is greater than 0, the backfill might still be in progress (wait for completion) or have failed (check logs for allocation errors). - Checkpoint Failures: Verify if
angarabase_checkpoint_index_flush_errors_totalis incrementing. A failure to flush indexes prevents the checkpoint from completing. Check disk space and permissions. - Logs: Inspect system logs for
WAL replayerrors or messages indicating that index pages could not be restored.
“Checkpoint loop doesn’t complete”
If the checkpoint process seems stuck or keeps restarting:
- Flush Errors: Check the
angarabase_checkpoint_index_flush_errors_totalmetric. - Worker Logs: Look for
checkpoint_worker: flush_all_indexes failedin the logs. This indicates that the index store is unable to persist pages, possibly due to disk space issues or I/O errors.
“WAL LSN drift resets > 0”
If angarabase_wal_lsn_drift_resets_total is incrementing:
- This indicates that after a crash (e.g.,
kill -9), the WAL head LSN was out of sync with the actual physical WAL size, and the system had to reset it. - Check the logs for
append_commit FAILEDorWAL VLF LSN driftmessages. While the system recovers automatically, frequent occurrences might indicate underlying storage fsync issues.
“High rate of stale tuple fallbacks”
If angarabase_index_stale_tuple_fallbacks_total is spiking:
- This means UPDATE operations are finding index entries that point to non-existent or stale tuples, forcing a slow sequential scan fallback.
- This can happen during heavy concurrent UPDATEs. If it persists, consider REINDEXing the affected table.
SQL examples
-- Verify index usage and check for stale index fallbacks
EXPLAIN SELECT * FROM public.your_table WHERE id = 123;
-- If the query plan shows [stale_index], it means the index entry was stale
-- and the engine fell back to a sequential scan.
NOTE: System catalog tables for index metadata (
angara_sys.indexesandangara_sys.index_stats) are not part of the current user-facing SQL surface. UseEXPLAINto verify index usage.