Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Troubleshooting Guide

This document moves the key operator fast path from the legacy troubleshooting runbook into AngaraBook.

Scope

It covers common AngaraBase operational incidents and quick actions for diagnostics/remediation.

Related documents:

Incident: False Positive Commit Conflicts (40001)

Symptoms

  • Clients receive SQLSTATE 40001 on every COMMIT attempt.
  • Startup logs may contain recovery warnings.

Typical causes

  • Version below the fix release for the VLF recovery path.
  • data_directory and transaction_log_directory are mixed up.

Actions

  • Upgrade to the fixed version.
  • Separate storage.data_directory and storage.transaction_log_directory.

Incident: Backpressure active (no-steal)

Symptoms

  • buffer_pool_backpressure_active == 1
  • buffer_pool_uncommitted_dirty_ratio and txn_write_set_limit_exceeded_total are growing

Actions

  • Reduce write transaction batch size.
  • Enable buffer_pool.backpressure.mode = fail_fast if needed.
  • Lower txn.max_write_set_pages and/or increase buffer_pool.size_bytes.

Incident: p99 spikes during checkpoint

Symptoms

  • Growth in checkpoint_duration_seconds and latency spikes.

Actions

  • Increase checkpoint.target_ms.
  • Limit writeback.max_bytes_per_sec.
  • Tune checkpoint.dirty_ratio_hard for earlier background writeback.

Incident: commit fsync tails / durable_lsn lag grows

Symptoms

  • Growth in commit_ack_latency_seconds and durable_lsn_lag_bytes.

Actions

  • Move WAL/TL to a separate volume if possible.
  • Tune group_commit.max_wait_us.
  • Reduce writeback interference.

Start / stop (operator baseline)

angarabase-server --config /etc/angarabase/angarabase.conf

Minimum checks before startup:

  • valid config;
  • correct data/txn log directories;
  • sufficient disk limits and fsync latency budget.

Incident: CRC mismatch in Delete Vector blob

Symptoms

  • Query fails with error: CRC mismatch for DV blob <path> (segment <id>): expected <exp>, got <got>
  • Possible during compaction or applying columnar DELETE.

What it means

The .bdel file (Delete Vector blob) is corrupted. The blob_uri field points to the exact file path, and segment_id is the segment identifier inside the blob. The error is fail-closed: reading stops and data is not modified.

Actions

  1. Find the corrupted file by blob_uri from the error message.
  2. Check storage volume integrity (IO errors in dmesg, S.M.A.R.T.).
  3. If the file is irreversibly corrupted, restore from backup (disaster-recovery.md).
  4. For recurring CRC errors, enable monitoring of angarabase_columnar_pending_deleted_rows to track DV fragmentation pressure.

Triage fast-path

  1. Check the binary version and active config.
  2. Capture baseline metrics (commit_ack_latency, checkpoint, backpressure).
  3. Check recovery/txn log state.
  4. Apply remediation for the corresponding incident.

Extended related materials:

Next