Troubleshooting Guide
This document moves the key operator fast path from the legacy troubleshooting runbook into AngaraBook.
Scope
It covers common AngaraBase operational incidents and quick actions for diagnostics/remediation.
Related documents:
Incident: False Positive Commit Conflicts (40001)
Symptoms
- Clients receive
SQLSTATE 40001on everyCOMMITattempt. - Startup logs may contain recovery warnings.
Typical causes
- Version below the fix release for the VLF recovery path.
data_directoryandtransaction_log_directoryare mixed up.
Actions
- Upgrade to the fixed version.
- Separate
storage.data_directoryandstorage.transaction_log_directory.
Incident: Backpressure active (no-steal)
Symptoms
buffer_pool_backpressure_active == 1buffer_pool_uncommitted_dirty_ratioandtxn_write_set_limit_exceeded_totalare growing
Actions
- Reduce write transaction batch size.
- Enable
buffer_pool.backpressure.mode = fail_fastif needed. - Lower
txn.max_write_set_pagesand/or increasebuffer_pool.size_bytes.
Incident: p99 spikes during checkpoint
Symptoms
- Growth in
checkpoint_duration_secondsand latency spikes.
Actions
- Increase
checkpoint.target_ms. - Limit
writeback.max_bytes_per_sec. - Tune
checkpoint.dirty_ratio_hardfor earlier background writeback.
Incident: commit fsync tails / durable_lsn lag grows
Symptoms
- Growth in
commit_ack_latency_secondsanddurable_lsn_lag_bytes.
Actions
- Move WAL/TL to a separate volume if possible.
- Tune
group_commit.max_wait_us. - Reduce writeback interference.
Start / stop (operator baseline)
angarabase-server --config /etc/angarabase/angarabase.conf
Minimum checks before startup:
- valid config;
- correct data/txn log directories;
- sufficient disk limits and fsync latency budget.
Incident: CRC mismatch in Delete Vector blob
Symptoms
- Query fails with error:
CRC mismatch for DV blob <path> (segment <id>): expected <exp>, got <got> - Possible during compaction or applying columnar DELETE.
What it means
The .bdel file (Delete Vector blob) is corrupted. The blob_uri field points to the exact file path, and segment_id is the segment identifier inside the blob. The error is fail-closed: reading stops and data is not modified.
Actions
- Find the corrupted file by
blob_urifrom the error message. - Check storage volume integrity (IO errors in
dmesg, S.M.A.R.T.). - If the file is irreversibly corrupted, restore from backup (
disaster-recovery.md). - For recurring CRC errors, enable monitoring of
angarabase_columnar_pending_deleted_rowsto track DV fragmentation pressure.
Triage fast-path
- Check the binary version and active config.
- Capture baseline metrics (
commit_ack_latency, checkpoint, backpressure). - Check recovery/txn log state.
- Apply remediation for the corresponding incident.
Extended related materials:
Next
- Diagnostics bundle runbook — what to attach to a ticket if the symptom does not map to a runbook.
- Disaster recovery playbook — for lease loss or corrupted datadir cases.
- Performance tuning guide — if the symptom is degradation, not outage.
- Operations overview — navigation across other operator materials.