Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

How to Read Query Plans

This guide helps DBAs/SREs read EXPLAIN in AngaraBase without knowing the planner internals. The goal is not to manually “outplay” the optimizer, but to quickly answer operator questions:

  • which execution path the database chose;
  • why that path was chosen;
  • whether the query uses vector/parallel path;
  • whether the plan was reused from cache or rebuilt;
  • where to look for the cause of high latency.

Quick Start

For a regular plan:

EXPLAIN SELECT * FROM public.orders WHERE customer_id = 42;

For operator diagnostics:

EXPLAIN (DIAGNOSTIC)
SELECT * FROM public.orders WHERE customer_id = 42;

For detailed output:

EXPLAIN (VERBOSE, DIAGNOSTIC)
SELECT * FROM public.orders WHERE customer_id = 42;

For machine-readable evidence:

EXPLAIN (VERBOSE, DIAGNOSTIC, FORMAT JSON)
SELECT * FROM public.orders WHERE customer_id = 42;

If you need to see runtime counters, use EXPLAIN ANALYZE. It executes the query, so for DML use it carefully and only in a safe environment.

Runtime Facts

In ANALYZE mode, AngaraBase collects additional query execution facts in the runtime_facts block. This block appears if the query encountered waits, data spill to disk, or rejection due to resource limits.

Example JSON output:

"runtime_facts": {
  "spill_bytes": 4096,
  "wal_sync_wait_ms": 12,
  "resource_reject_count": 1,
  "last_runtime_reason": "spilled_memory_budget"
}

Example text output: runtime_facts: spill_bytes=4096 wal_sync_wait_ms=12 resource_reject_count=1 last_runtime_reason=spilled_memory_budget

Main fields:

  • spill_bytes — amount of data spilled to disk (for example, when HashJoin or Sort lacks memory).
  • wal_sync_wait_ms — WAL synchronization wait time. (May be omitted for SELECT, with durability=relaxed, or if the transaction successfully joined group commit without additional I/O wait).
  • resource_reject_count — number of rejections due to resource limits.
  • last_runtime_reason — reason code, for example spilled_memory_budget.

Note: only fields with non-zero values are emitted.

How to Read the Tree

Read the plan bottom-up. The bottom operator receives data from a table or index. Each next operator above applies a filter, join, aggregation, sort, or projection.

Example:

Project cost=0.00..1030.00 rows=100
  VectorFilter cost=0.00..1025.00 rows=100
    VectorSeqScan workers_planned=2 workers_launched=2 numa_affinity=disabled table=public.ux_stats_v2 cost=0.00..1000.00 rows=1000
--- Optimizer Diagnostics ---
query_fingerprint=1795416667712787713
plan_fingerprint=3192678580981205807
workload_class=select
replan_reason=none
cache_status=hit
reason_codes=stats_default_fallback

Read it like this:

  1. VectorSeqScan reads table public.ux_stats_v2.
  2. VectorFilter applies the WHERE condition.
  3. Project keeps the required columns in the result.
  4. The Optimizer Diagnostics block explains query/plan identifiers, cache, replan reason, and reason codes.

Operator Line Format

Let’s break down the line:

VectorSeqScan workers_planned=2 workers_launched=2 numa_affinity=disabled table=public.ux_stats_v2 cost=0.00..1000.00 rows=1000
FieldWhat it meansHow an operator should read it
VectorSeqScanOperator type. Vector = vector executor, SeqScan = sequential table read.Reads the whole table in batches. Good for full scan / analytics, bad for point lookup on a large table without an index.
workers_planned=2How many workers the planner wanted to use.Plan allows parallelism.
workers_launched=2How many workers were actually allocated.If fewer than planned, runtime pressure or parallelism limits are possible.
numa_affinity=disabledWhether binding to NUMA-node is enabled.Usually disabled is normal for dev/cloud; on bare metal it may be a separate tuning question.
table=public.ux_stats_v2Source table.Verify that the expected table/schema is scanned.
cost=0.00..1000.00startup_cost..total_cost in planner units.Not milliseconds. Compare with alternative plans, not wall-clock.
rows=1000Estimated output rows from the operator.Large estimate error often leads to bad join order or unnecessary full scan.

Cost and Rows

cost is an internal work estimate, not execution time.

  • startup_cost — cost to get the first row.
  • total_cost — cost to get all rows.
  • rows — expected row count after the operator.

Common mistake: reading cost=1000 as 1000 ms. Do not do that. Cost is used by the optimizer to compare alternatives:

  • full scan vs index scan;
  • hash join vs nested/index join;
  • sort before or after filter;
  • aggregate over all rows or over already filtered input.

If rows is clearly far from reality, check statistics first:

ANALYZE public.orders;

SELECT *
FROM sys.table_stats
WHERE schema_name = 'public' AND table_name = 'orders';

SELECT *
FROM sys.column_stats
WHERE schema_name = 'public' AND table_name = 'orders';

Vector Prefix

Operators with the Vector prefix execute through the vectorized path:

  • VectorSeqScan
  • VectorIndexScan
  • VectorFilter
  • VectorProject
  • VectorWindowFunction
  • VectorSetOperation

The vector path processes data in batches, reducing per-row overhead. For operators this is usually a good sign, especially on scan/filter/aggregate workload.

If the expected Vector* disappeared:

  1. Check query shape: whether an expression was added that is not yet supported by the vector executor.
  2. Compare EXPLAIN (VERBOSE, DIAGNOSTIC) before/after the query change.
  3. Inspect reason_codes and plan node types.
  4. For latency regressions, use Performance tuning guide and Parallel runtime observability runbook.

Operator Glossary

OperatorWhat it doesWhen goodWhen suspicious
Scan / VectorSeqScanReads the whole table.Small table, analytics, low filter selectivity.Point lookup on a large table where an index should exist.
IndexScan / VectorIndexScanReads through an index, then checks residual filter if needed.Selective predicate on indexed column.If it returns a large share of the table, full scan may be cheaper.
IndexOnlyScanReads only the index, without heap fetch, if visibility map allows.Coverage index + all-visible pages.If it often falls back to heap, check visibility map / vacuum-like processes.
Filter / VectorFilterApplies WHERE/predicate to input stream.After scan/index scan.If filter is above expensive join, check pushdown.
Project / VectorProjectSelects/computes output columns.Normal top operator for SELECT.Usually not an issue except very expensive expressions.
JoinGeneric join node with kind=inner/left/right/full/cross.Expected join type matches SQL.cross almost always needs attention.
HashSemiJoinImplements EXISTS/semi join through hash.Good sign for decorrelated EXISTS.If semi join was expected but nested/cross-like plan appears.
HashAntiJoinImplements NOT EXISTS/anti join through hash.Good sign for anti-semi workload.If input is large and there is no memory headroom.
NLIndexJoinNested-loop probe by index.Small outer input + selective index lookup.Large outer input: may become many index probes.
AggregateCOUNT, SUM, GROUP BY, and other aggregate operations.After filter or join with already reduced input.If aggregate must materialize huge input.
SortSorts the stream.For ORDER BY, merge-like paths.Large sort without LIMIT or index on order key.
DistinctRemoves duplicates.Needed for DISTINCT.On large input without prior row reduction.
Limit / OffsetLimits or skips rows.LIMIT can sharply reduce total cost.Large OFFSET still forces reading/skipping many rows.
WindowFunctionWindow functions.Analytical queries.If it requires large sort/partition.
SetOperationUNION, INTERSECT, EXCEPT.Set queries.If unexpectedly expensive due to dedup/sort.
LateralJoinLATERAL/derived-table dependent path.Correlated derived inputs.Can be expensive on large outer inputs.
LateMaterializeDelayed column materialization. Reads only columns needed for filtering, then reads the rest later for rows that passed the filter.High filter selectivity (selectivity < 0.3).If selectivity is low, double read may be more expensive than normal.
DmlInsert / DmlUpdate / DmlDeleteSentinel for DML.EXPLAIN DML shows intent.For runtime counters, use EXPLAIN ANALYZE carefully.
DdlSentinel for DDL.Shows DDL path.Not a query performance hot path.

Scan Strategy Reason

For Scan (SeqScan) and IndexScan nodes, the planner prints the reason for choosing a specific scan strategy in the scan_strategy_reason field. This helps understand why the optimizer preferred sequential scan over index scan or vice versa.

Output examples:

  • index scan: high selectivity (0.0005) — index was chosen because the condition is highly selective.
  • seq scan chosen: low cardinality (0.1328) — SeqScan was chosen: selectivity is above [execution].index_cardinality_threshold (the planner considers the column “too low-cardinality” for an index on this predicate).
  • seq scan chosen: low selectivity (0.1111) — SeqScan was chosen: selectivity is not below [execution].index_scan_selectivity_threshold (separate gate after cardinality).

If you see seq scan chosen where you expected an index:

  1. Check statistics freshness (ANALYZE).
  2. Check distinct_estimate in sys.column_stats.
  3. Tune thresholds in angarabase.conf ([execution]) or through env before startup (ANGARABASE_INDEX_CARDINALITY_THRESHOLD, ANGARABASE_INDEX_SCAN_SELECTIVITY_THRESHOLD), then restart the server. SET ... from psql in Simple Query protocol does not change these knobs (see Performance tuning).

Optimizer Diagnostics

EXPLAIN (DIAGNOSTIC) adds a block:

--- Optimizer Diagnostics ---
query_fingerprint=1795416667712787713
plan_fingerprint=3192678580981205807
workload_class=select
replan_reason=none
cache_status=hit
reason_codes=stats_default_fallback

query_fingerprint

Stable identifier of the query’s logical shape. Literal values usually should not create a new fingerprint for every constant.

Use it to correlate:

  • slow query;
  • metrics;
  • logs;
  • repeated EXPLAIN;
  • regression evidence.

plan_fingerprint

Identifier of the plan shape. If the query is the same but the plan changed, query_fingerprint remains the same while plan_fingerprint changes.

This is useful when investigating:

  • “after ANALYZE, the query became faster/slower”;
  • “after adding an index, the plan changed”;
  • “yesterday it was IndexOnlyScan, today it is SeqScan again”.

workload_class

Workload class:

  • select
  • write
  • ddl
  • other classes if a specific path marks them.

For operators, this helps separate OLTP read path from write/DDL events.

replan_reason

Why the plan was rebuilt or why there is no explicit reason.

ValueMeaningWhat to do
noneNo explicit replan reason. Usually normal path.If cache_status=hit, cache is working.
stats_driftStatistics changed enough that the old plan could be stale.Check ANALYZE frequency, table churn, latency stability.
schema_changedSchema changed: DDL, index, column, or another schema signal.Normal after migrations; suspicious with frequent DDL in production.
aqp_feedbackRuntime feedback affected estimates/planning.Check AQP metrics and workload skew.
forced_fallbackPlanner/runtime chose a safe fallback.Compare reason codes and unsupported expressions.

cache_status

Shows how the query relates to the plan cache.

ValueMeaningHow to interpret
hitPlan reused.Good for stable OLTP.
missPlan built again.Normal for first run or new query shape.
bypassCache deliberately not used.Check DDL, volatile shape, diagnostics mode, or safety path.
invalidatedOld plan dropped.Look for replan_reason.
unknownRuntime did not pass status.Do not infer cache behavior only from this field.

reason_codes

Reasons for planner choice or fallback.

CodeMeaningWhat to check
stats_default_fallbackPlanner could not use detailed statistics and applied defaults.Run ANALYZE, check sys.table_stats and sys.column_stats.
index_only_eligiblePlan can read only the index without heap fetch.Check visibility map and index coverage.
bitmap_candidate_rejectedThere was an alternative bitmap-like/index path, but another path or residual filter was chosen.Compare predicate selectivity and presence of a suitable index.
hash_join_fits_work_memHash join is considered memory-eligible.When p99 grows, check memory pressure and join cardinality.
used_multicol_statsMulti-column statistics were used.Good sign for correlated predicates.

If reason_codes is empty, AngaraBase shows stats_default_fallback so operators do not get a “silent” diagnostic block.

JSON Format

For CI, evidence packs, and diffs between releases, use JSON:

EXPLAIN (VERBOSE, DIAGNOSTIC, FORMAT JSON)
SELECT * FROM public.orders WHERE customer_id = 42;

In JSON, the same entities are represented as fields:

  • Node Type
  • Startup Cost
  • Total Cost
  • Plan Rows
  • Plans
  • workers_planned
  • workers_launched
  • numa_affinity
  • query_fingerprint
  • plan_fingerprint
  • replan_reason
  • reason_codes
  • cache_status

For release evidence, do not compare the entire JSON byte-for-byte; compare stable properties: node class, join type, fingerprints, reason codes, and key estimates.

Common Reading Scenarios

1. Slow point lookup

Symptom:

VectorSeqScan table=public.orders ... rows=1000000

What to check:

  1. Whether there is an index on the filter column.
  2. Whether the planner sees statistics (sys.column_stats).
  3. Whether diagnostics shows stats_default_fallback.
  4. Whether filter selectivity is too low.

The desired plan for point lookup is usually closer to:

IndexScan index_name=... index_col=customer_id key_range=eq(...)

or:

IndexOnlyScan index_name=... index_col=customer_id index_only_reason="..."

2. Late Materialization

If the filter rejects a significant share of rows, the planner may choose the LateMaterialize node. This avoids expensive reading of all columns for rows that will be filtered out anyway.

The enable threshold is controlled by [execution].late_materialization_selectivity_threshold (default 0.3).

Plan example:

Project cost=10.00..50.00 rows=100
  LateMaterialize cost=5.00..45.00 rows=100
    VectorFilter (x > 100) cost=0.00..40.00 rows=100
      VectorSeqScan table=large_table cost=0.00..30.00 rows=1000

3. EXISTS should not be nested-loop

For query:

EXPLAIN (DIAGNOSTIC)
SELECT *
FROM public.orders o
WHERE EXISTS (
  SELECT 1
  FROM public.order_items i
  WHERE i.order_id = o.id
);

Good sign:

HashSemiJoin kind=semi on=...

This means the optimizer decorrelated EXISTS and chose hash semi join.

3. NOT EXISTS and anti join

Good sign:

HashAntiJoin kind=anti on=...

If input is large, check hash_join_fits_work_mem and memory metrics.

4. GROUP BY is too expensive

Symptom:

Aggregate cost=... rows=...
  VectorSeqScan table=...

What to check:

  1. Whether rows can be filtered before aggregate.
  2. Whether there are unnecessary projected columns.
  3. Whether group key fits fast path (for example, single integer key).
  4. Whether there are too many groups.
  5. Whether the query requires sorting after aggregate.

5. Parallel planned, but latency is high

Symptom:

workers_planned=2 workers_launched=0

or workers_launched is less than workers_planned.

What to check:

  1. Global parallel runtime limits.
  2. CPU saturation.
  3. pgwire/runtime queues.
  4. Memory pressure.
  5. Parallel runtime observability runbook.

6. Plan changed after ANALYZE

Compare:

  • query_fingerprint — should remain stable for the same SQL shape;
  • plan_fingerprint — changes if the plan shape changed;
  • replan_reason — should explain the rebuild;
  • reason_codes — show which new factors became available.

If IndexScan or IndexOnlyScan appears after ANALYZE, that is usually a good sign. If SeqScan appears on a large OLTP lookup, check selectivity and statistics.

Relationship with sys.* views

EXPLAIN shows the plan, and sys.* helps verify whether the optimizer has data for a good decision.

Minimum set:

SELECT *
FROM sys.table_stats
WHERE schema_name = 'public' AND table_name = 'orders';

SELECT *
FROM sys.column_stats
WHERE schema_name = 'public' AND table_name = 'orders';

SELECT *
FROM sys.multicolumn_stats
WHERE schema_name = 'public' AND table_name = 'orders';

SELECT *
FROM sys.workload_stats
WHERE schema_name = 'public' AND table_name = 'orders';

How to read:

  • row_count_estimate helps understand whether the optimizer knows table size.
  • distinct_estimate helps estimate equality predicate selectivity.
  • min_i64 / max_i64 help range predicates.
  • multicolumn_stats helps with correlated conditions.
  • workload_stats shows how the table is actually used.

Triage Checklist

When a user says “the query became slow”, proceed as follows:

  1. Capture the plan:

    EXPLAIN (VERBOSE, DIAGNOSTIC)
    <query>;
    
  2. If safe, capture runtime:

    EXPLAIN ANALYZE
    <query>;
    
  3. Read the tree bottom-up.

  4. Find the widest input (rows sharply above expected).

  5. Check whether the expected operator class is used: IndexScan, IndexOnlyScan, HashSemiJoin, Aggregate, Vector*.

  6. Check reason_codes.

  7. If stats_default_fallback is present, run ANALYZE and compare the plan.

  8. Compare query_fingerprint and plan_fingerprint before/after.

  9. If the issue is in parallel path, go to Parallel runtime observability runbook.

  10. If the issue is in storage/IO, go to Performance tuning guide.

Common Interpretation Mistakes

MistakeWhy it is wrongCorrect approach
cost=1000 means 1000 msCost is a relative optimizer model.For timing, use EXPLAIN ANALYZE and latency metrics.
SeqScan is always badFull scan may be optimal for small tables or low-selectivity filters.Check table size, selectivity, and index availability.
IndexScan is always betterIndex scan may be worse than full scan if it returns a large share of the table.Compare rows/cost and actual runtime.
workers_planned=2 guarantees 2x speedupWorkers have overhead and may not start.Check workers_launched and runtime metrics.
replan_reason=none means optimizer did nothingIt means there is no explicit replan reason.Check cache_status, fingerprints, and reason codes.
stats_default_fallback can be ignoredIt signals the optimizer may be guessing without statistics.Run ANALYZE and check sys.* views.

When to Escalate

Escalate as a bug/perf issue if:

  • EXPLAIN (DIAGNOSTIC) does not show a diagnostic block (make sure you are not using EXPLAIN (DIAGNOSTIC ON) — the boolean suffix ON/OFF is not supported by AngaraBase and silently ignores the option; use EXPLAIN (DIAGNOSTIC) without suffix);
  • query_fingerprint is unstable for the same query shape;
  • plan_fingerprint changes without schema/stats/AQP reason;
  • replan_reason=stats_drift appears too often on a stable table;
  • IndexOnlyScan is chosen, but runtime constantly does heap fetch;
  • HashSemiJoin/HashAntiJoin disappear for simple EXISTS/NOT EXISTS;
  • workers_launched is systematically below workers_planned without a clear pressure signal;
  • JSON/text outputs contradict each other.

For bug report, attach:

  • query SQL;
  • EXPLAIN (VERBOSE, DIAGNOSTIC) text;
  • EXPLAIN (VERBOSE, DIAGNOSTIC, FORMAT JSON);
  • relevant rows from sys.table_stats, sys.column_stats, sys.multicolumn_stats, sys.workload_stats;
  • AngaraBase version and capability/profile snapshot, if available.

Next