Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Storage Engine

Goal

Explain how AngaraBase stores data on disk, what file formats are used, and what storage engines are available (or planned).

Pluggable Storage Architecture

AngaraBase uses a modular storage architecture: the storage engine is responsible for physical data placement, while upper layers (SQL, transactions, indexes) interact via a unified interface. This allows swapping different engines for different workloads without modifying the SQL layer.

The current engine is the Row-Store. Column-Store and In-Memory Engines are planned for the future.

Row-Store (Current Engine)

Page-Based Heap Storage

Data is stored in fixed-size pages (16 KB). Each table is represented by a set of heap pages, where rows are placed sequentially.

Slotted Pages

Each page is structured as a slotted page:

┌─────────────────────────────────────┐
│ Page Header (LSN, checksum, flags) │
├─────────────────────────────────────┤
│ Slot Array → [offset₁, offset₂…] │
│ (grows downwards ↓) │
│ │
│ free space │
│ │
│ (row data grows upwards ↑) │
│ Row₂ data │ Row₁ data │
└─────────────────────────────────────┘
  • Header contains the LSN (log sequence number), checksum, page_type, and flags.
  • Slot array — an array of pointers to rows within the page. This allows moving rows within the page without altering external references (TID = page_id + slot_id).
  • Row data is written from the end of the page towards the beginning.

Page types (page_type): 0 = data (heap), 1 = index (reserved), 2 = meta (reserved), 3 = overflow (reserved).

Page Checksums

Each page is protected by a checksum (CRC32C). When reading a page from disk, the checksum is verified; if it doesn’t match, the server returns an error rather than serving corrupted data (fail-closed with diagnostics).

File Formats

AngaraBase uses a per-database file model: each database consists of a pair of files.

ExtensionPurposeMagic
.adbHeap pages containing table data and indexes. Self-contained per-database storage file.APG1
.atlTransaction log (WAL) for the specific database. Per-database WAL.ADB1

AngaraTree indexes are stored inside the .adb filepage_type = 1 in the page header is reserved for them. There is no separate file for indexes.

Source of truth: crates/angarabase/src/on_disk.rs, angarabook/src/operations/upgrade-and-migration.md.

Data Directory Layout

The data directory is defined by the storage.data_directory setting. Typical layout:

data_directory/
├── VERSION # initialization marker (AVR1, 256 bytes, CRC32C)
├── base.adb # system database (SysCatalog) — heap pages
├── base.atl # WAL for the system database
├── mydb.adb # user DB — heap pages + index pages
├── mydb.atl # WAL for the user DB
└── …

WAL is not stored in separate segmented files (like wal_000001 in PostgreSQL). In AngaraBase, the WAL is a single .atl file per database, located in the same data_directory.

The storage.transaction_log_directory setting defines an alternative directory for .atl files (useful for placing WAL on a separate disk).

Key Settings

[storage]
data_directory = "/var/lib/angarabase/data"
transaction_log_directory = "/var/lib/angarabase/txlog"

More details on settings — Configuration.

Column-Store (Planned, v6)

A columnar engine based on an Arrow/Parquet-like format, oriented towards analytical queries (OLAP). Data is stored by columns, supporting compression and vectorized scans.

Status: not implemented, planned for the v6 roadmap.

In-Memory Engine (Planned, v5 — AngaraMemory)

An engine for storing data in RAM. Three modes are planned:

ModeDescription
volatileData in memory only; lost upon restart.
loggedWrites are duplicated in the WAL; recovered upon restart.
snapshottedPeriodic snapshots to disk + WAL.

Status: in development.

HTAP Direction

AngaraBase’s long-term strategy is HTAP (Hybrid Transactional/Analytical Processing):

  • Row-Store serves OLTP (transactional workloads).
  • Column-Store serves OLAP (analytics).
  • Between them lies asynchronous replication: data from the row-store is converted into columnar format for analytical queries.

This will allow running analytics on fresh data without ETL pipelines and without affecting transactional performance.

Concepts (What to read next)

How-to (What to do)

Reference