How to Read Query Plans

This guide helps DBAs/SREs read EXPLAIN in AngaraBase without knowing the planner internals. The goal is not to manually “outplay” the optimizer, but to quickly answer operator questions:

which execution path the database chose;
why that path was chosen;
whether the query uses vector/parallel path;
whether the plan was reused from cache or rebuilt;
where to look for the cause of high latency.

Quick Start

For a regular plan:

EXPLAIN SELECT * FROM public.orders WHERE customer_id = 42;

For operator diagnostics:

EXPLAIN (DIAGNOSTIC)
SELECT * FROM public.orders WHERE customer_id = 42;

For detailed output:

EXPLAIN (VERBOSE, DIAGNOSTIC)
SELECT * FROM public.orders WHERE customer_id = 42;

For machine-readable evidence:

EXPLAIN (VERBOSE, DIAGNOSTIC, FORMAT JSON)
SELECT * FROM public.orders WHERE customer_id = 42;

If you need to see runtime counters, use EXPLAIN ANALYZE. It executes the query, so for DML use it carefully and only in a safe environment.

In ANALYZE mode, AngaraBase collects additional query execution facts in the runtime_facts block. This block appears if the query encountered waits, data spill to disk, or rejection due to resource limits.

Example JSON output:

"runtime_facts": {
  "spill_bytes": 4096,
  "wal_sync_wait_ms": 12,
  "resource_reject_count": 1,
  "last_runtime_reason": "spilled_memory_budget"
}

Example text output: runtime_facts: spill_bytes=4096 wal_sync_wait_ms=12 resource_reject_count=1 last_runtime_reason=spilled_memory_budget

Main fields:

spill_bytes — amount of data spilled to disk (for example, when HashJoin or Sort lacks memory).
wal_sync_wait_ms — WAL synchronization wait time. (May be omitted for SELECT, with durability=relaxed, or if the transaction successfully joined group commit without additional I/O wait).
resource_reject_count — number of rejections due to resource limits.
last_runtime_reason — reason code, for example spilled_memory_budget.

Note: only fields with non-zero values are emitted.

How to Read the Tree

Read the plan bottom-up. The bottom operator receives data from a table or index. Each next operator above applies a filter, join, aggregation, sort, or projection.

Example:

Project cost=0.00..1030.00 rows=100
  VectorFilter cost=0.00..1025.00 rows=100
    VectorSeqScan workers_planned=2 workers_launched=2 numa_affinity=disabled table=public.ux_stats_v2 cost=0.00..1000.00 rows=1000
--- Optimizer Diagnostics ---
query_fingerprint=1795416667712787713
plan_fingerprint=3192678580981205807
workload_class=select
replan_reason=none
cache_status=hit
reason_codes=stats_default_fallback

Read it like this:

VectorSeqScan reads table public.ux_stats_v2.
VectorFilter applies the WHERE condition.
Project keeps the required columns in the result.
The Optimizer Diagnostics block explains query/plan identifiers, cache, replan reason, and reason codes.

Operator Line Format

Let’s break down the line:

VectorSeqScan workers_planned=2 workers_launched=2 numa_affinity=disabled table=public.ux_stats_v2 cost=0.00..1000.00 rows=1000

Field	What it means	How an operator should read it
`VectorSeqScan`	Operator type. `Vector` = vector executor, `SeqScan` = sequential table read.	Reads the whole table in batches. Good for full scan / analytics, bad for point lookup on a large table without an index.
`workers_planned=2`	How many workers the planner wanted to use.	Plan allows parallelism.
`workers_launched=2`	How many workers were actually allocated.	If fewer than planned, runtime pressure or parallelism limits are possible.
`numa_affinity=disabled`	Whether binding to NUMA-node is enabled.	Usually `disabled` is normal for dev/cloud; on bare metal it may be a separate tuning question.
`table=public.ux_stats_v2`	Source table.	Verify that the expected table/schema is scanned.
`cost=0.00..1000.00`	`startup_cost..total_cost` in planner units.	Not milliseconds. Compare with alternative plans, not wall-clock.
`rows=1000`	Estimated output rows from the operator.	Large estimate error often leads to bad join order or unnecessary full scan.

Cost and Rows

cost is an internal work estimate, not execution time.

startup_cost — cost to get the first row.
total_cost — cost to get all rows.
rows — expected row count after the operator.

Common mistake: reading cost=1000 as 1000 ms. Do not do that. Cost is used by the optimizer to compare alternatives:

full scan vs index scan;
hash join vs nested/index join;
sort before or after filter;
aggregate over all rows or over already filtered input.

If rows is clearly far from reality, check statistics first:

ANALYZE public.orders;

SELECT *
FROM sys.table_stats
WHERE schema_name = 'public' AND table_name = 'orders';

SELECT *
FROM sys.column_stats
WHERE schema_name = 'public' AND table_name = 'orders';

Vector Prefix

Operators with the Vector prefix execute through the vectorized path:

VectorSeqScan
VectorIndexScan
VectorFilter
VectorProject
VectorWindowFunction
VectorSetOperation

The vector path processes data in batches, reducing per-row overhead. For operators this is usually a good sign, especially on scan/filter/aggregate workload.

If the expected Vector* disappeared:

Check query shape: whether an expression was added that is not yet supported by the vector executor.
Compare EXPLAIN (VERBOSE, DIAGNOSTIC) before/after the query change.
Inspect reason_codes and plan node types.
For latency regressions, use Performance tuning guide and Parallel runtime observability runbook.

Operator Glossary

Operator	What it does	When good	When suspicious
`Scan` / `VectorSeqScan`	Reads the whole table.	Small table, analytics, low filter selectivity.	Point lookup on a large table where an index should exist.
`IndexScan` / `VectorIndexScan`	Reads through an index, then checks residual filter if needed.	Selective predicate on indexed column.	If it returns a large share of the table, full scan may be cheaper.
`IndexOnlyScan`	Reads only the index, without heap fetch, if visibility map allows.	Coverage index + all-visible pages.	If it often falls back to heap, check visibility map / vacuum-like processes.
`Filter` / `VectorFilter`	Applies `WHERE`/predicate to input stream.	After scan/index scan.	If filter is above expensive join, check pushdown.
`Project` / `VectorProject`	Selects/computes output columns.	Normal top operator for `SELECT`.	Usually not an issue except very expensive expressions.
`Join`	Generic join node with `kind=inner/left/right/full/cross`.	Expected join type matches SQL.	`cross` almost always needs attention.
`HashSemiJoin`	Implements `EXISTS`/semi join through hash.	Good sign for decorrelated `EXISTS`.	If semi join was expected but nested/cross-like plan appears.
`HashAntiJoin`	Implements `NOT EXISTS`/anti join through hash.	Good sign for anti-semi workload.	If input is large and there is no memory headroom.
`NLIndexJoin`	Nested-loop probe by index.	Small outer input + selective index lookup.	Large outer input: may become many index probes.
`Aggregate`	`COUNT`, `SUM`, `GROUP BY`, and other aggregate operations.	After filter or join with already reduced input.	If aggregate must materialize huge input.
`Sort`	Sorts the stream.	For `ORDER BY`, merge-like paths.	Large sort without `LIMIT` or index on order key.
`Distinct`	Removes duplicates.	Needed for `DISTINCT`.	On large input without prior row reduction.
`Limit` / `Offset`	Limits or skips rows.	`LIMIT` can sharply reduce total cost.	Large `OFFSET` still forces reading/skipping many rows.
`WindowFunction`	Window functions.	Analytical queries.	If it requires large sort/partition.
`SetOperation`	`UNION`, `INTERSECT`, `EXCEPT`.	Set queries.	If unexpectedly expensive due to dedup/sort.
`LateralJoin`	LATERAL/derived-table dependent path.	Correlated derived inputs.	Can be expensive on large outer inputs.
`LateMaterialize`	Delayed column materialization. Reads only columns needed for filtering, then reads the rest later for rows that passed the filter.	High filter selectivity (selectivity < 0.3).	If selectivity is low, double read may be more expensive than normal.
`DmlInsert` / `DmlUpdate` / `DmlDelete`	Sentinel for DML.	EXPLAIN DML shows intent.	For runtime counters, use `EXPLAIN ANALYZE` carefully.
`Ddl`	Sentinel for DDL.	Shows DDL path.	Not a query performance hot path.

Scan Strategy Reason

For Scan (SeqScan) and IndexScan nodes, the planner prints the reason for choosing a specific scan strategy in the scan_strategy_reason field. This helps understand why the optimizer preferred sequential scan over index scan or vice versa.

Output examples:

index scan: high selectivity (0.0005) — index was chosen because the condition is highly selective.
seq scan chosen: low cardinality (0.1328) — SeqScan was chosen: selectivity is above [execution].index_cardinality_threshold (the planner considers the column “too low-cardinality” for an index on this predicate).
seq scan chosen: low selectivity (0.1111) — SeqScan was chosen: selectivity is not below [execution].index_scan_selectivity_threshold (separate gate after cardinality).

If you see seq scan chosen where you expected an index:

Check statistics freshness (ANALYZE).
Check distinct_estimate in sys.column_stats.
Tune thresholds in angarabase.conf ([execution]) or through env before startup (ANGARABASE_INDEX_CARDINALITY_THRESHOLD, ANGARABASE_INDEX_SCAN_SELECTIVITY_THRESHOLD), then restart the server. SET ... from psql in Simple Query protocol does not change these knobs (see Performance tuning).

Optimizer Diagnostics

EXPLAIN (DIAGNOSTIC) adds a block:

--- Optimizer Diagnostics ---
query_fingerprint=1795416667712787713
plan_fingerprint=3192678580981205807
workload_class=select
replan_reason=none
cache_status=hit
reason_codes=stats_default_fallback

`query_fingerprint`

Stable identifier of the query’s logical shape. Literal values usually should not create a new fingerprint for every constant.

Use it to correlate:

slow query;
metrics;
logs;
repeated EXPLAIN;
regression evidence.

`plan_fingerprint`

Identifier of the plan shape. If the query is the same but the plan changed, query_fingerprint remains the same while plan_fingerprint changes.

This is useful when investigating:

“after ANALYZE, the query became faster/slower”;
“after adding an index, the plan changed”;
“yesterday it was IndexOnlyScan, today it is SeqScan again”.

`workload_class`

Workload class:

select
write
ddl
other classes if a specific path marks them.

For operators, this helps separate OLTP read path from write/DDL events.

`replan_reason`

Why the plan was rebuilt or why there is no explicit reason.

Value	Meaning	What to do
`none`	No explicit replan reason. Usually normal path.	If `cache_status=hit`, cache is working.
`stats_drift`	Statistics changed enough that the old plan could be stale.	Check `ANALYZE` frequency, table churn, latency stability.
`schema_changed`	Schema changed: DDL, index, column, or another schema signal.	Normal after migrations; suspicious with frequent DDL in production.
`aqp_feedback`	Runtime feedback affected estimates/planning.	Check AQP metrics and workload skew.
`forced_fallback`	Planner/runtime chose a safe fallback.	Compare reason codes and unsupported expressions.

`cache_status`

Shows how the query relates to the plan cache.

Value	Meaning	How to interpret
`hit`	Plan reused.	Good for stable OLTP.
`miss`	Plan built again.	Normal for first run or new query shape.
`bypass`	Cache deliberately not used.	Check DDL, volatile shape, diagnostics mode, or safety path.
`invalidated`	Old plan dropped.	Look for `replan_reason`.
`unknown`	Runtime did not pass status.	Do not infer cache behavior only from this field.

`reason_codes`

Reasons for planner choice or fallback.

Code	Meaning	What to check
`stats_default_fallback`	Planner could not use detailed statistics and applied defaults.	Run `ANALYZE`, check `sys.table_stats` and `sys.column_stats`.
`index_only_eligible`	Plan can read only the index without heap fetch.	Check visibility map and index coverage.
`bitmap_candidate_rejected`	There was an alternative bitmap-like/index path, but another path or residual filter was chosen.	Compare predicate selectivity and presence of a suitable index.
`hash_join_fits_work_mem`	Hash join is considered memory-eligible.	When p99 grows, check memory pressure and join cardinality.
`used_multicol_stats`	Multi-column statistics were used.	Good sign for correlated predicates.

If reason_codes is empty, AngaraBase shows stats_default_fallback so operators do not get a “silent” diagnostic block.

JSON Format

For CI, evidence packs, and diffs between releases, use JSON:

EXPLAIN (VERBOSE, DIAGNOSTIC, FORMAT JSON)
SELECT * FROM public.orders WHERE customer_id = 42;

In JSON, the same entities are represented as fields:

Node Type
Startup Cost
Total Cost
Plan Rows
Plans
workers_planned
workers_launched
numa_affinity
query_fingerprint
plan_fingerprint
replan_reason
reason_codes
cache_status

For release evidence, do not compare the entire JSON byte-for-byte; compare stable properties: node class, join type, fingerprints, reason codes, and key estimates.

Common Reading Scenarios

1. Slow point lookup

Symptom:

VectorSeqScan table=public.orders ... rows=1000000

What to check:

Whether there is an index on the filter column.
Whether the planner sees statistics (sys.column_stats).
Whether diagnostics shows stats_default_fallback.
Whether filter selectivity is too low.

The desired plan for point lookup is usually closer to:

IndexScan index_name=... index_col=customer_id key_range=eq(...)

or:

IndexOnlyScan index_name=... index_col=customer_id index_only_reason="..."

2. Late Materialization

If the filter rejects a significant share of rows, the planner may choose the LateMaterialize node. This avoids expensive reading of all columns for rows that will be filtered out anyway.

The enable threshold is controlled by [execution].late_materialization_selectivity_threshold (default 0.3).

Plan example:

Project cost=10.00..50.00 rows=100
  LateMaterialize cost=5.00..45.00 rows=100
    VectorFilter (x > 100) cost=0.00..40.00 rows=100
      VectorSeqScan table=large_table cost=0.00..30.00 rows=1000

3. `EXISTS` should not be nested-loop

For query:

EXPLAIN (DIAGNOSTIC)
SELECT *
FROM public.orders o
WHERE EXISTS (
  SELECT 1
  FROM public.order_items i
  WHERE i.order_id = o.id
);

Good sign:

HashSemiJoin kind=semi on=...

This means the optimizer decorrelated EXISTS and chose hash semi join.

3. `NOT EXISTS` and anti join

Good sign:

HashAntiJoin kind=anti on=...

If input is large, check hash_join_fits_work_mem and memory metrics.

4. GROUP BY is too expensive

Symptom:

Aggregate cost=... rows=...
  VectorSeqScan table=...

What to check:

Whether rows can be filtered before aggregate.
Whether there are unnecessary projected columns.
Whether group key fits fast path (for example, single integer key).
Whether there are too many groups.
Whether the query requires sorting after aggregate.

5. Parallel planned, but latency is high

Symptom:

workers_planned=2 workers_launched=0

or workers_launched is less than workers_planned.

What to check:

Global parallel runtime limits.
CPU saturation.
pgwire/runtime queues.
Memory pressure.
Parallel runtime observability runbook.

6. Plan changed after `ANALYZE`

Compare:

query_fingerprint — should remain stable for the same SQL shape;
plan_fingerprint — changes if the plan shape changed;
replan_reason — should explain the rebuild;
reason_codes — show which new factors became available.

If IndexScan or IndexOnlyScan appears after ANALYZE, that is usually a good sign. If SeqScan appears on a large OLTP lookup, check selectivity and statistics.

Relationship with `sys.*` views

EXPLAIN shows the plan, and sys.* helps verify whether the optimizer has data for a good decision.

Minimum set:

SELECT *
FROM sys.table_stats
WHERE schema_name = 'public' AND table_name = 'orders';

SELECT *
FROM sys.column_stats
WHERE schema_name = 'public' AND table_name = 'orders';

SELECT *
FROM sys.multicolumn_stats
WHERE schema_name = 'public' AND table_name = 'orders';

SELECT *
FROM sys.workload_stats
WHERE schema_name = 'public' AND table_name = 'orders';

How to read:

row_count_estimate helps understand whether the optimizer knows table size.
distinct_estimate helps estimate equality predicate selectivity.
min_i64 / max_i64 help range predicates.
multicolumn_stats helps with correlated conditions.
workload_stats shows how the table is actually used.

Triage Checklist

When a user says “the query became slow”, proceed as follows:

Capture the plan:
```
EXPLAIN (VERBOSE, DIAGNOSTIC)
<query>;
```
If safe, capture runtime:
```
EXPLAIN ANALYZE
<query>;
```
Read the tree bottom-up.
Find the widest input (rows sharply above expected).
Check whether the expected operator class is used: IndexScan, IndexOnlyScan, HashSemiJoin, Aggregate, Vector*.
Check reason_codes.
If stats_default_fallback is present, run ANALYZE and compare the plan.
Compare query_fingerprint and plan_fingerprint before/after.
If the issue is in parallel path, go to Parallel runtime observability runbook.
If the issue is in storage/IO, go to Performance tuning guide.

Common Interpretation Mistakes

Mistake	Why it is wrong	Correct approach
`cost=1000` means 1000 ms	Cost is a relative optimizer model.	For timing, use `EXPLAIN ANALYZE` and latency metrics.
`SeqScan` is always bad	Full scan may be optimal for small tables or low-selectivity filters.	Check table size, selectivity, and index availability.
`IndexScan` is always better	Index scan may be worse than full scan if it returns a large share of the table.	Compare rows/cost and actual runtime.
`workers_planned=2` guarantees 2x speedup	Workers have overhead and may not start.	Check `workers_launched` and runtime metrics.
`replan_reason=none` means optimizer did nothing	It means there is no explicit replan reason.	Check `cache_status`, fingerprints, and reason codes.
`stats_default_fallback` can be ignored	It signals the optimizer may be guessing without statistics.	Run `ANALYZE` and check `sys.*` views.

When to Escalate

Escalate as a bug/perf issue if:

EXPLAIN (DIAGNOSTIC) does not show a diagnostic block (make sure you are not using EXPLAIN (DIAGNOSTIC ON) — the boolean suffix ON/OFF is not supported by AngaraBase and silently ignores the option; use EXPLAIN (DIAGNOSTIC) without suffix);
query_fingerprint is unstable for the same query shape;
plan_fingerprint changes without schema/stats/AQP reason;
replan_reason=stats_drift appears too often on a stable table;
IndexOnlyScan is chosen, but runtime constantly does heap fetch;
HashSemiJoin/HashAntiJoin disappear for simple EXISTS/NOT EXISTS;
workers_launched is systematically below workers_planned without a clear pressure signal;
JSON/text outputs contradict each other.

For bug report, attach:

query SQL;
EXPLAIN (VERBOSE, DIAGNOSTIC) text;
EXPLAIN (VERBOSE, DIAGNOSTIC, FORMAT JSON);
relevant rows from sys.table_stats, sys.column_stats, sys.multicolumn_stats, sys.workload_stats;
AngaraBase version and capability/profile snapshot, if available.

Performance tuning guide — what to do after reading the plan if the issue is latency/throughput.
Parallel runtime observability runbook — how to investigate workers_planned/workers_launched and runtime pressure.
Observability metrics checklist — which metrics to correlate with query_fingerprint and plan_fingerprint.
Diagnostics bundle runbook — how to collect evidence for support.

AngaraBook

How to Read Query Plans

Quick Start

Runtime Facts

How to Read the Tree

Operator Line Format

Cost and Rows

Vector Prefix

Operator Glossary

Scan Strategy Reason

Optimizer Diagnostics

`query_fingerprint`

`plan_fingerprint`

`workload_class`

`replan_reason`

`cache_status`

`reason_codes`

JSON Format

Common Reading Scenarios

1. Slow point lookup

2. Late Materialization

3. `EXISTS` should not be nested-loop

3. `NOT EXISTS` and anti join

4. GROUP BY is too expensive

5. Parallel planned, but latency is high

6. Plan changed after `ANALYZE`

Relationship with `sys.*` views

Triage Checklist

Common Interpretation Mistakes

When to Escalate

Next

Keyboard shortcuts

AngaraBook