How to Read Query Plans
This guide helps DBAs/SREs read EXPLAIN in AngaraBase without knowing
the planner internals. The goal is not to manually “outplay”
the optimizer, but to quickly answer operator questions:
- which execution path the database chose;
- why that path was chosen;
- whether the query uses vector/parallel path;
- whether the plan was reused from cache or rebuilt;
- where to look for the cause of high latency.
Quick Start
For a regular plan:
EXPLAIN SELECT * FROM public.orders WHERE customer_id = 42;
For operator diagnostics:
EXPLAIN (DIAGNOSTIC)
SELECT * FROM public.orders WHERE customer_id = 42;
For detailed output:
EXPLAIN (VERBOSE, DIAGNOSTIC)
SELECT * FROM public.orders WHERE customer_id = 42;
For machine-readable evidence:
EXPLAIN (VERBOSE, DIAGNOSTIC, FORMAT JSON)
SELECT * FROM public.orders WHERE customer_id = 42;
If you need to see runtime counters, use EXPLAIN ANALYZE.
It executes the query, so for DML use it carefully and only
in a safe environment.
Runtime Facts
In ANALYZE mode, AngaraBase collects additional query execution facts
in the runtime_facts block. This block appears if the query encountered
waits, data spill to disk, or rejection due to resource limits.
Example JSON output:
"runtime_facts": {
"spill_bytes": 4096,
"wal_sync_wait_ms": 12,
"resource_reject_count": 1,
"last_runtime_reason": "spilled_memory_budget"
}
Example text output:
runtime_facts: spill_bytes=4096 wal_sync_wait_ms=12 resource_reject_count=1 last_runtime_reason=spilled_memory_budget
Main fields:
spill_bytes— amount of data spilled to disk (for example, when HashJoin or Sort lacks memory).wal_sync_wait_ms— WAL synchronization wait time. (May be omitted forSELECT, withdurability=relaxed, or if the transaction successfully joined group commit without additional I/O wait).resource_reject_count— number of rejections due to resource limits.last_runtime_reason— reason code, for examplespilled_memory_budget.
Note: only fields with non-zero values are emitted.
How to Read the Tree
Read the plan bottom-up. The bottom operator receives data from a table or index. Each next operator above applies a filter, join, aggregation, sort, or projection.
Example:
Project cost=0.00..1030.00 rows=100
VectorFilter cost=0.00..1025.00 rows=100
VectorSeqScan workers_planned=2 workers_launched=2 numa_affinity=disabled table=public.ux_stats_v2 cost=0.00..1000.00 rows=1000
--- Optimizer Diagnostics ---
query_fingerprint=1795416667712787713
plan_fingerprint=3192678580981205807
workload_class=select
replan_reason=none
cache_status=hit
reason_codes=stats_default_fallback
Read it like this:
VectorSeqScanreads tablepublic.ux_stats_v2.VectorFilterapplies theWHEREcondition.Projectkeeps the required columns in the result.- The
Optimizer Diagnosticsblock explains query/plan identifiers, cache, replan reason, and reason codes.
Operator Line Format
Let’s break down the line:
VectorSeqScan workers_planned=2 workers_launched=2 numa_affinity=disabled table=public.ux_stats_v2 cost=0.00..1000.00 rows=1000
| Field | What it means | How an operator should read it |
|---|---|---|
VectorSeqScan | Operator type. Vector = vector executor, SeqScan = sequential table read. | Reads the whole table in batches. Good for full scan / analytics, bad for point lookup on a large table without an index. |
workers_planned=2 | How many workers the planner wanted to use. | Plan allows parallelism. |
workers_launched=2 | How many workers were actually allocated. | If fewer than planned, runtime pressure or parallelism limits are possible. |
numa_affinity=disabled | Whether binding to NUMA-node is enabled. | Usually disabled is normal for dev/cloud; on bare metal it may be a separate tuning question. |
table=public.ux_stats_v2 | Source table. | Verify that the expected table/schema is scanned. |
cost=0.00..1000.00 | startup_cost..total_cost in planner units. | Not milliseconds. Compare with alternative plans, not wall-clock. |
rows=1000 | Estimated output rows from the operator. | Large estimate error often leads to bad join order or unnecessary full scan. |
Cost and Rows
cost is an internal work estimate, not execution time.
startup_cost— cost to get the first row.total_cost— cost to get all rows.rows— expected row count after the operator.
Common mistake: reading cost=1000 as 1000 ms. Do not do that.
Cost is used by the optimizer to compare alternatives:
- full scan vs index scan;
- hash join vs nested/index join;
- sort before or after filter;
- aggregate over all rows or over already filtered input.
If rows is clearly far from reality, check statistics first:
ANALYZE public.orders;
SELECT *
FROM sys.table_stats
WHERE schema_name = 'public' AND table_name = 'orders';
SELECT *
FROM sys.column_stats
WHERE schema_name = 'public' AND table_name = 'orders';
Vector Prefix
Operators with the Vector prefix execute through the vectorized path:
VectorSeqScanVectorIndexScanVectorFilterVectorProjectVectorWindowFunctionVectorSetOperation
The vector path processes data in batches, reducing per-row overhead. For operators this is usually a good sign, especially on scan/filter/aggregate workload.
If the expected Vector* disappeared:
- Check query shape: whether an expression was added that is not yet supported by the vector executor.
- Compare
EXPLAIN (VERBOSE, DIAGNOSTIC)before/after the query change. - Inspect
reason_codesand plan node types. - For latency regressions, use Performance tuning guide and Parallel runtime observability runbook.
Operator Glossary
| Operator | What it does | When good | When suspicious |
|---|---|---|---|
Scan / VectorSeqScan | Reads the whole table. | Small table, analytics, low filter selectivity. | Point lookup on a large table where an index should exist. |
IndexScan / VectorIndexScan | Reads through an index, then checks residual filter if needed. | Selective predicate on indexed column. | If it returns a large share of the table, full scan may be cheaper. |
IndexOnlyScan | Reads only the index, without heap fetch, if visibility map allows. | Coverage index + all-visible pages. | If it often falls back to heap, check visibility map / vacuum-like processes. |
Filter / VectorFilter | Applies WHERE/predicate to input stream. | After scan/index scan. | If filter is above expensive join, check pushdown. |
Project / VectorProject | Selects/computes output columns. | Normal top operator for SELECT. | Usually not an issue except very expensive expressions. |
Join | Generic join node with kind=inner/left/right/full/cross. | Expected join type matches SQL. | cross almost always needs attention. |
HashSemiJoin | Implements EXISTS/semi join through hash. | Good sign for decorrelated EXISTS. | If semi join was expected but nested/cross-like plan appears. |
HashAntiJoin | Implements NOT EXISTS/anti join through hash. | Good sign for anti-semi workload. | If input is large and there is no memory headroom. |
NLIndexJoin | Nested-loop probe by index. | Small outer input + selective index lookup. | Large outer input: may become many index probes. |
Aggregate | COUNT, SUM, GROUP BY, and other aggregate operations. | After filter or join with already reduced input. | If aggregate must materialize huge input. |
Sort | Sorts the stream. | For ORDER BY, merge-like paths. | Large sort without LIMIT or index on order key. |
Distinct | Removes duplicates. | Needed for DISTINCT. | On large input without prior row reduction. |
Limit / Offset | Limits or skips rows. | LIMIT can sharply reduce total cost. | Large OFFSET still forces reading/skipping many rows. |
WindowFunction | Window functions. | Analytical queries. | If it requires large sort/partition. |
SetOperation | UNION, INTERSECT, EXCEPT. | Set queries. | If unexpectedly expensive due to dedup/sort. |
LateralJoin | LATERAL/derived-table dependent path. | Correlated derived inputs. | Can be expensive on large outer inputs. |
LateMaterialize | Delayed column materialization. Reads only columns needed for filtering, then reads the rest later for rows that passed the filter. | High filter selectivity (selectivity < 0.3). | If selectivity is low, double read may be more expensive than normal. |
DmlInsert / DmlUpdate / DmlDelete | Sentinel for DML. | EXPLAIN DML shows intent. | For runtime counters, use EXPLAIN ANALYZE carefully. |
Ddl | Sentinel for DDL. | Shows DDL path. | Not a query performance hot path. |
Scan Strategy Reason
For Scan (SeqScan) and IndexScan nodes, the planner prints the reason for choosing
a specific scan strategy in the scan_strategy_reason field. This helps
understand why the optimizer preferred sequential scan over index scan
or vice versa.
Output examples:
index scan: high selectivity (0.0005)— index was chosen because the condition is highly selective.seq scan chosen: low cardinality (0.1328)— SeqScan was chosen: selectivity is above[execution].index_cardinality_threshold(the planner considers the column “too low-cardinality” for an index on this predicate).seq scan chosen: low selectivity (0.1111)— SeqScan was chosen: selectivity is not below[execution].index_scan_selectivity_threshold(separate gate after cardinality).
If you see seq scan chosen where you expected an index:
- Check statistics freshness (
ANALYZE). - Check
distinct_estimateinsys.column_stats. - Tune thresholds in
angarabase.conf([execution]) or through env before startup (ANGARABASE_INDEX_CARDINALITY_THRESHOLD,ANGARABASE_INDEX_SCAN_SELECTIVITY_THRESHOLD), then restart the server.SET ...from psql in Simple Query protocol does not change these knobs (see Performance tuning).
Optimizer Diagnostics
EXPLAIN (DIAGNOSTIC) adds a block:
--- Optimizer Diagnostics ---
query_fingerprint=1795416667712787713
plan_fingerprint=3192678580981205807
workload_class=select
replan_reason=none
cache_status=hit
reason_codes=stats_default_fallback
query_fingerprint
Stable identifier of the query’s logical shape. Literal values usually should not create a new fingerprint for every constant.
Use it to correlate:
- slow query;
- metrics;
- logs;
- repeated
EXPLAIN; - regression evidence.
plan_fingerprint
Identifier of the plan shape. If the query is the same but the plan changed,
query_fingerprint remains the same while plan_fingerprint changes.
This is useful when investigating:
- “after
ANALYZE, the query became faster/slower”; - “after adding an index, the plan changed”;
- “yesterday it was
IndexOnlyScan, today it isSeqScanagain”.
workload_class
Workload class:
selectwriteddl- other classes if a specific path marks them.
For operators, this helps separate OLTP read path from write/DDL events.
replan_reason
Why the plan was rebuilt or why there is no explicit reason.
| Value | Meaning | What to do |
|---|---|---|
none | No explicit replan reason. Usually normal path. | If cache_status=hit, cache is working. |
stats_drift | Statistics changed enough that the old plan could be stale. | Check ANALYZE frequency, table churn, latency stability. |
schema_changed | Schema changed: DDL, index, column, or another schema signal. | Normal after migrations; suspicious with frequent DDL in production. |
aqp_feedback | Runtime feedback affected estimates/planning. | Check AQP metrics and workload skew. |
forced_fallback | Planner/runtime chose a safe fallback. | Compare reason codes and unsupported expressions. |
cache_status
Shows how the query relates to the plan cache.
| Value | Meaning | How to interpret |
|---|---|---|
hit | Plan reused. | Good for stable OLTP. |
miss | Plan built again. | Normal for first run or new query shape. |
bypass | Cache deliberately not used. | Check DDL, volatile shape, diagnostics mode, or safety path. |
invalidated | Old plan dropped. | Look for replan_reason. |
unknown | Runtime did not pass status. | Do not infer cache behavior only from this field. |
reason_codes
Reasons for planner choice or fallback.
| Code | Meaning | What to check |
|---|---|---|
stats_default_fallback | Planner could not use detailed statistics and applied defaults. | Run ANALYZE, check sys.table_stats and sys.column_stats. |
index_only_eligible | Plan can read only the index without heap fetch. | Check visibility map and index coverage. |
bitmap_candidate_rejected | There was an alternative bitmap-like/index path, but another path or residual filter was chosen. | Compare predicate selectivity and presence of a suitable index. |
hash_join_fits_work_mem | Hash join is considered memory-eligible. | When p99 grows, check memory pressure and join cardinality. |
used_multicol_stats | Multi-column statistics were used. | Good sign for correlated predicates. |
If reason_codes is empty, AngaraBase shows stats_default_fallback
so operators do not get a “silent” diagnostic block.
JSON Format
For CI, evidence packs, and diffs between releases, use JSON:
EXPLAIN (VERBOSE, DIAGNOSTIC, FORMAT JSON)
SELECT * FROM public.orders WHERE customer_id = 42;
In JSON, the same entities are represented as fields:
Node TypeStartup CostTotal CostPlan RowsPlansworkers_plannedworkers_launchednuma_affinityquery_fingerprintplan_fingerprintreplan_reasonreason_codescache_status
For release evidence, do not compare the entire JSON byte-for-byte; compare stable properties: node class, join type, fingerprints, reason codes, and key estimates.
Common Reading Scenarios
1. Slow point lookup
Symptom:
VectorSeqScan table=public.orders ... rows=1000000
What to check:
- Whether there is an index on the filter column.
- Whether the planner sees statistics (
sys.column_stats). - Whether diagnostics shows
stats_default_fallback. - Whether filter selectivity is too low.
The desired plan for point lookup is usually closer to:
IndexScan index_name=... index_col=customer_id key_range=eq(...)
or:
IndexOnlyScan index_name=... index_col=customer_id index_only_reason="..."
2. Late Materialization
If the filter rejects a significant share of rows, the planner may choose the LateMaterialize node. This avoids expensive reading of all columns for rows that will be filtered out anyway.
The enable threshold is controlled by [execution].late_materialization_selectivity_threshold (default 0.3).
Plan example:
Project cost=10.00..50.00 rows=100
LateMaterialize cost=5.00..45.00 rows=100
VectorFilter (x > 100) cost=0.00..40.00 rows=100
VectorSeqScan table=large_table cost=0.00..30.00 rows=1000
3. EXISTS should not be nested-loop
For query:
EXPLAIN (DIAGNOSTIC)
SELECT *
FROM public.orders o
WHERE EXISTS (
SELECT 1
FROM public.order_items i
WHERE i.order_id = o.id
);
Good sign:
HashSemiJoin kind=semi on=...
This means the optimizer decorrelated EXISTS and chose hash semi join.
3. NOT EXISTS and anti join
Good sign:
HashAntiJoin kind=anti on=...
If input is large, check hash_join_fits_work_mem and memory metrics.
4. GROUP BY is too expensive
Symptom:
Aggregate cost=... rows=...
VectorSeqScan table=...
What to check:
- Whether rows can be filtered before aggregate.
- Whether there are unnecessary projected columns.
- Whether group key fits fast path (for example, single integer key).
- Whether there are too many groups.
- Whether the query requires sorting after aggregate.
5. Parallel planned, but latency is high
Symptom:
workers_planned=2 workers_launched=0
or workers_launched is less than workers_planned.
What to check:
- Global parallel runtime limits.
- CPU saturation.
- pgwire/runtime queues.
- Memory pressure.
- Parallel runtime observability runbook.
6. Plan changed after ANALYZE
Compare:
query_fingerprint— should remain stable for the same SQL shape;plan_fingerprint— changes if the plan shape changed;replan_reason— should explain the rebuild;reason_codes— show which new factors became available.
If IndexScan or IndexOnlyScan appears after ANALYZE, that is usually
a good sign. If SeqScan appears on a large OLTP lookup, check
selectivity and statistics.
Relationship with sys.* views
EXPLAIN shows the plan, and sys.* helps verify whether the optimizer
has data for a good decision.
Minimum set:
SELECT *
FROM sys.table_stats
WHERE schema_name = 'public' AND table_name = 'orders';
SELECT *
FROM sys.column_stats
WHERE schema_name = 'public' AND table_name = 'orders';
SELECT *
FROM sys.multicolumn_stats
WHERE schema_name = 'public' AND table_name = 'orders';
SELECT *
FROM sys.workload_stats
WHERE schema_name = 'public' AND table_name = 'orders';
How to read:
row_count_estimatehelps understand whether the optimizer knows table size.distinct_estimatehelps estimate equality predicate selectivity.min_i64/max_i64help range predicates.multicolumn_statshelps with correlated conditions.workload_statsshows how the table is actually used.
Triage Checklist
When a user says “the query became slow”, proceed as follows:
-
Capture the plan:
EXPLAIN (VERBOSE, DIAGNOSTIC) <query>; -
If safe, capture runtime:
EXPLAIN ANALYZE <query>; -
Read the tree bottom-up.
-
Find the widest input (
rowssharply above expected). -
Check whether the expected operator class is used:
IndexScan,IndexOnlyScan,HashSemiJoin,Aggregate,Vector*. -
Check
reason_codes. -
If
stats_default_fallbackis present, runANALYZEand compare the plan. -
Compare
query_fingerprintandplan_fingerprintbefore/after. -
If the issue is in parallel path, go to Parallel runtime observability runbook.
-
If the issue is in storage/IO, go to Performance tuning guide.
Common Interpretation Mistakes
| Mistake | Why it is wrong | Correct approach |
|---|---|---|
cost=1000 means 1000 ms | Cost is a relative optimizer model. | For timing, use EXPLAIN ANALYZE and latency metrics. |
SeqScan is always bad | Full scan may be optimal for small tables or low-selectivity filters. | Check table size, selectivity, and index availability. |
IndexScan is always better | Index scan may be worse than full scan if it returns a large share of the table. | Compare rows/cost and actual runtime. |
workers_planned=2 guarantees 2x speedup | Workers have overhead and may not start. | Check workers_launched and runtime metrics. |
replan_reason=none means optimizer did nothing | It means there is no explicit replan reason. | Check cache_status, fingerprints, and reason codes. |
stats_default_fallback can be ignored | It signals the optimizer may be guessing without statistics. | Run ANALYZE and check sys.* views. |
When to Escalate
Escalate as a bug/perf issue if:
EXPLAIN (DIAGNOSTIC)does not show a diagnostic block (make sure you are not usingEXPLAIN (DIAGNOSTIC ON)— the boolean suffixON/OFFis not supported by AngaraBase and silently ignores the option; useEXPLAIN (DIAGNOSTIC)without suffix);query_fingerprintis unstable for the same query shape;plan_fingerprintchanges without schema/stats/AQP reason;replan_reason=stats_driftappears too often on a stable table;IndexOnlyScanis chosen, but runtime constantly does heap fetch;HashSemiJoin/HashAntiJoindisappear for simpleEXISTS/NOT EXISTS;workers_launchedis systematically belowworkers_plannedwithout a clear pressure signal;- JSON/text outputs contradict each other.
For bug report, attach:
- query SQL;
EXPLAIN (VERBOSE, DIAGNOSTIC)text;EXPLAIN (VERBOSE, DIAGNOSTIC, FORMAT JSON);- relevant rows from
sys.table_stats,sys.column_stats,sys.multicolumn_stats,sys.workload_stats; - AngaraBase version and capability/profile snapshot, if available.
Next
- Performance tuning guide — what to do after reading the plan if the issue is latency/throughput.
- Parallel runtime observability runbook — how to
investigate
workers_planned/workers_launchedand runtime pressure. - Observability metrics checklist — which metrics
to correlate with
query_fingerprintandplan_fingerprint. - Diagnostics bundle runbook — how to collect evidence for support.