Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Index Durability

Overview

AngaraBase ensures that all indexes, including PRIMARY KEY and secondary BTree indexes, are persistent and survive system restarts or crashes. Indexes are managed by the IndexStore component, which handles the allocation of unique table identifiers (index_table_id) and coordinates with the Checkpoint worker to flush index pages to persistent storage.

There is one deliberate exception: indexes over volatile InMemory tables (storage='memory', durability='none'), including CREATE TEMP TABLE, are catalog-visible for the session/query planner but do not allocate a persistent index_table_id. They are rebuilt from the live in-memory heap when needed and disappear with the table.

Durability pipeline (PRIMARY KEY)

The durability of PRIMARY KEY indexes is guaranteed through a multi-stage process that integrates with the system catalog and the checkpointing mechanism:

  1. Table Creation: When a CREATE TABLE ... PRIMARY KEY statement is executed, the DDL executor allocates a unique index_table_id using the storage engine.
  2. Catalog Persistence: The Primary Key definition is saved in the SysCatalog with its assigned index_table_id. This ensures that the mapping between the table and its index is persistent.
  3. Checkpoint Integration: The Checkpoint worker periodically calls flush_all_indexes(). This operation forces all dirty index pages from memory to disk.
  4. Recovery and Restoration: Upon system restart, the recovery process reads the index definitions from the catalog and restores the index state from the persisted pages on disk.
  5. Startup Backfill (Legacy Migration): For legacy databases where PRIMARY KEY indexes were created without an index_table_id, AngaraBase performs an asynchronous background backfill after the database begins accepting connections (to avoid blocking startup). This process identifies legacy PKs, allocates the missing index_table_id, updates the catalog, and persists the index pages.

Durability pipeline (secondary indexes)

Secondary indexes follow a “durable-by-default” path during creation and maintenance:

  1. Immediate Persistence: During CREATE INDEX, the system immediately forces a synchronous flush of index pages to disk after the index build is complete. This ensures that the index is durable even if a crash occurs before the next checkpoint.
  2. Periodic Flushing: Similar to Primary Keys, secondary indexes are included in the periodic flush_all_indexes() calls by the Checkpoint worker.

Volatile InMemory indexes

InMemory tables with durability='none' use a volatile index path. This includes temporary tables, because CREATE TEMP TABLE is forced to memory + durability=none even if the statement includes a durable storage hint.

Behavior:

  • CREATE INDEX succeeds on volatile InMemory tables.
  • The index definition is visible in the catalog while the table exists.
  • index_table_id is intentionally absent, because there are no persistent index pages to restore.
  • The executor can still use a B-Tree/index-scan path by building an ephemeral index from current in-memory rows.
  • On session disconnect or temp table cleanup, the table and its volatile index metadata are removed together.

Example:

SET search_path = public;

CREATE TEMP TABLE tt_items (
  id INT,
  code mvarchar(16)
);

INSERT INTO tt_items (id, code) VALUES (1, 'AbC  ');
CREATE INDEX idx_tt_items_code ON tt_items (code);

-- Uses mvarchar equality semantics and can be served through the volatile index path.
SELECT id FROM tt_items WHERE code = 'abc';

This design keeps temp-table workloads fast and avoids WAL/checkpoint pressure, while preserving the same comparison contract as durable B-Tree indexes. In particular, mvarchar index keys are normalized with the same case-insensitive and trailing-space-insensitive rules used by expression evaluation.

Checkpoint fail-closed semantics

AngaraBase employs a fail-closed approach to index durability during the checkpoint process. The flush_all_indexes() operation returns a success status:

  • Success: If all indexes are successfully flushed, the checkpoint continues, and the end_checkpoint record is written to the WAL.
  • Failure: If flush_all_indexes() returns false (indicating a flush error), the checkpoint is aborted. The end_checkpoint record is not written.

In the event of an aborted checkpoint, the next system startup will trigger a WAL replay starting from the last successfully completed checkpoint, ensuring that no index data is lost or left in an inconsistent state. The metric angarabase_checkpoint_index_flush_errors_total tracks these occurrences.

Monitoring

You can monitor the status of index durability and backfill operations using the following Prometheus metrics:

MetricTypeDescription
angarabase_pkey_backfill_in_progressGauge1 while startup PK backfill is running
angarabase_pkey_backfill_ok_totalCounterPK indexes successfully backfilled with TableId
angarabase_pkey_backfill_fail_totalCounterPK backfill failures (allocation or persist error)
angarabase_checkpoint_index_flush_errors_totalCounterCheckpoint aborted due to index flush failure
angarabase_index_pkey_no_table_id_totalGaugeLegacy PK indexes found at startup (before backfill)
angarabase_index_restore_empty_totalCounterIndexes that restored empty after restart (Alert if > 0)
angarabase_wal_lsn_drift_resets_totalCounterWAL VLF LSN drift resets after crash (Alert if > 0)
angarabase_index_stale_tuple_fallbacks_totalCounterUPDATE fallbacks to seq-scan due to stale index tuple

Alerting Guidance & PromQL

  • Empty Indexes: rate(angarabase_index_restore_empty_total[5m]) > 0
    • Threshold: > 0 is critical. Indicates data loss or corruption in index persistence.
  • WAL Drift: rate(angarabase_wal_lsn_drift_resets_total[5m]) > 0
    • Threshold: > 0 is critical. Indicates WAL corruption or crash-safety failure.
  • Stale Tuples: rate(angarabase_index_stale_tuple_fallbacks_total[5m]) > 0.1
    • Threshold: Occasional fallbacks are normal, but a high rate indicates index corruption or MVCC issues.

Troubleshooting

“After restart the index is empty (entries=0)”

If an index appears empty after a restart despite having data previously (or if angarabase_index_restore_empty_total > 0), check the following:

  1. Legacy PKs: Check angarabase_index_pkey_no_table_id_total. If it is greater than 0, the backfill might still be in progress (wait for completion) or have failed (check logs for allocation errors).
  2. Checkpoint Failures: Verify if angarabase_checkpoint_index_flush_errors_total is incrementing. A failure to flush indexes prevents the checkpoint from completing. Check disk space and permissions.
  3. Logs: Inspect system logs for WAL replay errors or messages indicating that index pages could not be restored.

“Checkpoint loop doesn’t complete”

If the checkpoint process seems stuck or keeps restarting:

  1. Flush Errors: Check the angarabase_checkpoint_index_flush_errors_total metric.
  2. Worker Logs: Look for checkpoint_worker: flush_all_indexes failed in the logs. This indicates that the index store is unable to persist pages, possibly due to disk space issues or I/O errors.

“WAL LSN drift resets > 0”

If angarabase_wal_lsn_drift_resets_total is incrementing:

  1. This indicates that after a crash (e.g., kill -9), the WAL head LSN was out of sync with the actual physical WAL size, and the system had to reset it.
  2. Check the logs for append_commit FAILED or WAL VLF LSN drift messages. While the system recovers automatically, frequent occurrences might indicate underlying storage fsync issues.

“High rate of stale tuple fallbacks”

If angarabase_index_stale_tuple_fallbacks_total is spiking:

  1. This means UPDATE operations are finding index entries that point to non-existent or stale tuples, forcing a slow sequential scan fallback.
  2. This can happen during heavy concurrent UPDATEs. If it persists, consider REINDEXing the affected table.

SQL examples

-- Verify index usage and check for stale index fallbacks
EXPLAIN SELECT * FROM public.your_table WHERE id = 123;

-- If the query plan shows [stale_index], it means the index entry was stale
-- and the engine fell back to a sequential scan.

NOTE: System catalog tables for index metadata (angara_sys.indexes and angara_sys.index_stats) are not part of the current user-facing SQL surface. Use EXPLAIN to verify index usage.