Query Processing
Goal
Explain how AngaraBase processes SQL queries: from text to result. Useful for understanding EXPLAIN plans and performance diagnostics.
Pipeline Overview
Each SQL query goes through four stages:
SQL text ──▸ Parsing ──▸ Planning ──▸ Optimization ──▸ Execution ──▸ Result
1. Parsing: SQL → AST
The parser converts the query text into an Abstract Syntax Tree (AST). AngaraBase uses a PostgreSQL-compatible SQL dialect.
If the syntax is not supported, the server returns SQLSTATE 0A000 (feature_not_supported) with a description of the unsupported construct.
2. Planning: AST → Logical Plan
During the planning stage:
- Name resolution — matching table, column, and function names to catalog objects.
- Type checking — verifying types and applying automatic coercion if needed.
The result is a logical plan describing what needs to be done, but not how.
3. Optimization (AngaraPlan)
The AngaraPlan optimizer converts the logical plan into a physical one, choosing the most efficient execution strategy.
Cost-based optimizer (CBO): decisions are made based on statistics (AngaraStat) — row counts, value distributions, index availability.
Key optimizer decisions:
| Decision | Options |
|---|---|
| Access path | Full table scan, B+tree index scan, BRIN scan |
| Join method | Hash join, nested loop join |
| Join order | Reordering tables to minimize intermediate results |
Robust planning: the optimizer is resilient to estimation errors — with a significant discrepancy between estimated and actual rows, the plan remains viable (avoids worst-case behavior).
LEO (Learning Optimizer): feedback loop — after query execution, actual statistics are used to improve future estimates.
4. Execution (AngaraFlow)
The AngaraFlow executor runs the physical plan in an iterator/streaming model (Volcano): each operator requests the next row from its child operator.
Core Operators:
| Operator | Description |
|---|---|
| Scan | Reading rows from the heap (full scan) or an index (index scan) |
| Filter | Applying WHERE predicates |
| Hash Join | Joining via a hash table; Grace hash join for large datasets |
| Nested Loop | Joining via nested loops (for small tables or index lookups) |
| Group By | Aggregation (GROUP BY, HAVING) |
| Sort | Sorting; external sort for data that doesn’t fit in memory |
| Limit | Limiting the number of rows |
AngaraStat (Statistics)
The optimizer uses statistics from system tables to estimate plan costs.
sys.table_stats
| Column | Description |
|---|---|
row_count | Estimated number of rows in the table |
mutation_epoch | Mutation counter (to identify stale statistics) |
sys.column_stats
| Column | Description |
|---|---|
ndv | Number of distinct values (HyperLogLog) |
min_value / max_value | Value range boundaries |
null_count | Number of NULL values |
histogram | Value distribution (equi-height histogram) |
mcv | Most common values (and their frequencies) |
Managing Statistics Level
ALTER TABLE t SET (stats_level_max = 2);
| Level | What is gathered |
|---|---|
| 0 | Only row_count and mutation_epoch |
| 1 | + NDV, min/max, null_count |
| 2 | + Histograms, MCV (reservoir sampling) |
| 3 | + Extended statistics (reserved) |
A higher level gives the optimizer more information but increases statistics gathering time.
Query Diagnostics
EXPLAIN
Viewing the execution plan without executing the query:
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
Variants:
EXPLAIN ANALYZE SELECT ...; -- executes the query, shows actual rows/time
EXPLAIN (BUFFERS) SELECT ...; -- + page read statistics
EXPLAIN (FORMAT JSON) SELECT ...; -- JSON output
System Views
| View | Description |
|---|---|
angara_stat_activity | Currently active queries |
angara_stat_statements | Aggregated statistics by query type |
angara_top_queries | Top queries by execution time |
Slow Query Log
Enabled via environment variables:
ANGARABASE_LOG_MIN_DURATION_MS=100 # log queries taking longer than 100 ms
ANGARABASE_LOG_QUERY_TEXT=1 # include the query text in the log
More details — Diagnostics.
Future
Planned query processing improvements:
| Component | Description |
|---|---|
| AngaraVector | Vectorized/SIMD execution — processing batches by column instead of row-at-a-time |
| AngaraParallel | Morsel-driven parallelism — parallel execution across multiple cores |
| AngaraAdapt | Adaptive processing — switching strategies during execution |
Related Sections
Concepts (What to read next)
- Indexes — how the optimizer chooses an index for
SELECT/UPDATE. - Catalog and Metadata — where statistics used by the planner are stored.
- Transactions and MVCC — how the isolation level affects the execution plan.
How-to (What to do)
- Diagnostics —
EXPLAIN ANALYZE, slow-query log, how to read a plan. - Tracing — distributed tracing of parse → plan → execute phases.
- Logging —
ANGARABASE_LOG_QUERY_TEXT,ANGARABASE_LOG_MIN_DURATION_MS.
Reference
- SQL Overview — what parts of the standard are supported at the planner level.
- System Views
sys.*—sys.queries,sys.column_stats,sys.health.