Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Query Processing

Goal

Explain how AngaraBase processes SQL queries: from text to result. Useful for understanding EXPLAIN plans and performance diagnostics.

Pipeline Overview

Each SQL query goes through four stages:

SQL text ──▸ Parsing ──▸ Planning ──▸ Optimization ──▸ Execution ──▸ Result

1. Parsing: SQL → AST

The parser converts the query text into an Abstract Syntax Tree (AST). AngaraBase uses a PostgreSQL-compatible SQL dialect.

If the syntax is not supported, the server returns SQLSTATE 0A000 (feature_not_supported) with a description of the unsupported construct.

2. Planning: AST → Logical Plan

During the planning stage:

  • Name resolution — matching table, column, and function names to catalog objects.
  • Type checking — verifying types and applying automatic coercion if needed.

The result is a logical plan describing what needs to be done, but not how.

3. Optimization (AngaraPlan)

The AngaraPlan optimizer converts the logical plan into a physical one, choosing the most efficient execution strategy.

Cost-based optimizer (CBO): decisions are made based on statistics (AngaraStat) — row counts, value distributions, index availability.

Key optimizer decisions:

DecisionOptions
Access pathFull table scan, B+tree index scan, BRIN scan
Join methodHash join, nested loop join
Join orderReordering tables to minimize intermediate results

Robust planning: the optimizer is resilient to estimation errors — with a significant discrepancy between estimated and actual rows, the plan remains viable (avoids worst-case behavior).

LEO (Learning Optimizer): feedback loop — after query execution, actual statistics are used to improve future estimates.

4. Execution (AngaraFlow)

The AngaraFlow executor runs the physical plan in an iterator/streaming model (Volcano): each operator requests the next row from its child operator.

Core Operators:

OperatorDescription
ScanReading rows from the heap (full scan) or an index (index scan)
FilterApplying WHERE predicates
Hash JoinJoining via a hash table; Grace hash join for large datasets
Nested LoopJoining via nested loops (for small tables or index lookups)
Group ByAggregation (GROUP BY, HAVING)
SortSorting; external sort for data that doesn’t fit in memory
LimitLimiting the number of rows

AngaraStat (Statistics)

The optimizer uses statistics from system tables to estimate plan costs.

sys.table_stats

ColumnDescription
row_countEstimated number of rows in the table
mutation_epochMutation counter (to identify stale statistics)

sys.column_stats

ColumnDescription
ndvNumber of distinct values (HyperLogLog)
min_value / max_valueValue range boundaries
null_countNumber of NULL values
histogramValue distribution (equi-height histogram)
mcvMost common values (and their frequencies)

Managing Statistics Level

ALTER TABLE t SET (stats_level_max = 2);
LevelWhat is gathered
0Only row_count and mutation_epoch
1+ NDV, min/max, null_count
2+ Histograms, MCV (reservoir sampling)
3+ Extended statistics (reserved)

A higher level gives the optimizer more information but increases statistics gathering time.

Query Diagnostics

EXPLAIN

Viewing the execution plan without executing the query:

EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

Variants:

EXPLAIN ANALYZE SELECT ...; -- executes the query, shows actual rows/time
EXPLAIN (BUFFERS) SELECT ...; -- + page read statistics
EXPLAIN (FORMAT JSON) SELECT ...; -- JSON output

System Views

ViewDescription
angara_stat_activityCurrently active queries
angara_stat_statementsAggregated statistics by query type
angara_top_queriesTop queries by execution time

Slow Query Log

Enabled via environment variables:

ANGARABASE_LOG_MIN_DURATION_MS=100 # log queries taking longer than 100 ms
ANGARABASE_LOG_QUERY_TEXT=1 # include the query text in the log

More details — Diagnostics.

Future

Planned query processing improvements:

ComponentDescription
AngaraVectorVectorized/SIMD execution — processing batches by column instead of row-at-a-time
AngaraParallelMorsel-driven parallelism — parallel execution across multiple cores
AngaraAdaptAdaptive processing — switching strategies during execution

Concepts (What to read next)

How-to (What to do)

  • DiagnosticsEXPLAIN ANALYZE, slow-query log, how to read a plan.
  • Tracing — distributed tracing of parse → plan → execute phases.
  • LoggingANGARABASE_LOG_QUERY_TEXT, ANGARABASE_LOG_MIN_DURATION_MS.

Reference