Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

AngaraBase Tracing Operations Guide

Audience: Database operators, SREs, performance engineers


Overview

AngaraBase includes structured tracing based on the tracing crate with OpenTelemetry support. This system replaces legacy env_logger and provides end-to-end visibility for the query execution pipeline.

Key Benefits:

  • End-to-end spans: accept → authenticate → parse → plan → execute → storage → response
  • Each query has a unique trace_id
  • JSON-structured logs for machine parsing
  • Integration with Jaeger/Tempo through OpenTelemetry
  • Automatic tracing-context propagation across async/sync boundaries

Configuration

Basic Setup

In angarabase.conf:

[diagnostics]
# Tracing output format: "text" (human-readable) or "json" (structured)
tracing_format = "json"

# Log level filtering (same as RUST_LOG)
log_level = "info"

Environment Variables

VariableDescriptionExample
RUST_LOGLog level filterRUST_LOG=angarabase=debug,tokio=info
ANGARABASE_OTLP_ENDPOINTOpenTelemetry collector endpointhttp://jaeger:14268/api/traces
ANGARABASE_TRACE_SAMPLE_RATESampling rate (0.0-1.0)0.1 (10% sampling)

JSON vs Text Format

Text format (human-readable):

2026-03-12T10:30:45.123Z INFO angarabase::query::executor: query_start session_id=12345 sql="SELECT * FROM users"
2026-03-12T10:30:45.125Z DEBUG angarabase::query::planner: plan_created plan_hash=abc123 estimated_rows=1000

JSON format (machine-parseable):

{"timestamp":"2026-03-12T10:30:45.123Z","level":"INFO","target":"angarabase::query::executor","fields":{"session_id":12345,"sql":"SELECT * FROM users"},"span":{"name":"query_execution","trace_id":"abc123"}}

Tracing Architecture

Span Hierarchy

query_execution (root span)
├── parse (SQL → AST)
├── plan (AST → execution plan)
├── execute
│ ├── storage_io (page reads/writes)
│ ├── lock_acquisition
│ └── wal_flush
└── commit (autocommit finalization)

Span Propagation

AngaraBase automatically propagates tracing context through:

  1. Async boundaries: tokio::task::spawn_blocking calls
  2. Thread pool: Worker thread execution
  3. Network layer: AngaraNet io_uring/tokio integration

Implementation pattern:

#![allow(unused)]
fn main() {
let span = tracing::Span::current();
tokio::task::spawn_blocking(move || {
 let _enter = span.enter();
 // Sync code here inherits tracing context
 engine.execute(&query)
}).await
}

Operational Procedures

Enabling Tracing

  1. Development/Debug:
RUST_LOG=angarabase=trace ./angarabase-server
  1. Production (JSON logs):
# angarabase.conf
[diagnostics]
tracing_format = "json"
log_level = "info"
  1. OpenTelemetry Export:
export ANGARABASE_OTLP_ENDPOINT=http://jaeger:14268/api/traces
export ANGARABASE_TRACE_SAMPLE_RATE=0.05 # 5% sampling
./angarabase-server

Monitoring Query Performance

1. Slow Query Detection

Text logs:

grep "query_execution.*duration_ms" /var/log/angarabase.log | \
 awk '$NF > 1000' | head -10 # Queries > 1 second

JSON logs:

jq 'select(.span.name == "query_execution" and .fields.duration_ms > 1000)' \
 /var/log/angarabase.log

2. Per-Phase Timing Analysis

Look for spans with names: parse, plan, execute, commit

Example JSON query:

jq 'select(.span.name == "execute" and .fields.duration_ms > 500)' \
 /var/log/angarabase.log | \
 jq '.fields | {trace_id, duration_ms, estimated_rows}'

3. Lock Contention Detection

# Find queries waiting on locks
jq 'select(.fields.wait_event_type == "Lock")' /var/log/angarabase.log

Troubleshooting Common Issues

High Parse Time

# Find queries with slow parsing
jq 'select(.span.name == "parse" and .fields.duration_ms > 100)' \
 /var/log/angarabase.log | jq '.fields.sql'

Common causes:

  • Complex SQL with many JOINs
  • Large IN clauses
  • Deeply nested subqueries

High Plan Time

# Find queries with slow planning
jq 'select(.span.name == "plan" and .fields.duration_ms > 200)' \
 /var/log/angarabase.log

Common causes:

  • Missing statistics
  • Complex join ordering
  • Large number of tables

High Execute Time

# Correlate with storage I/O
jq 'select(.span.name == "storage_io" and .fields.duration_ms > 1000)' \
 /var/log/angarabase.log

Common causes:

  • Sequential scans
  • I/O bottlenecks
  • Lock contention

Integration with External Tools

Jaeger Integration

  1. Setup Jaeger:
docker run -d --name jaeger \
-p 14268:14268 -p 16686:16686 \
jaegertracing/all-in-one:latest
  1. Configure AngaraBase:
export ANGARABASE_OTLP_ENDPOINT=http://localhost:14268/api/traces
  1. View traces: http://localhost:16686

Grafana/Tempo Integration

# tempo.yaml
server:
 http_listen_port: 3200

distributor:
 receivers:
 otlp:
 protocols:
 http:
 endpoint: 0.0.0.0:4318

storage:
 trace:
 backend: local
 local:
 path: /tmp/tempo/traces

Log Aggregation (ELK Stack)

Logstash configuration:

input {
 file {
 path => "/var/log/angarabase.log"
 codec => "json"
 }
}

filter {
 if [span][name] {
 mutate {
 add_field => { "trace_operation" => "%{[span][name]}" }
 }
 }
}

output {
 elasticsearch {
 hosts => ["elasticsearch:9200"]
 index => "angarabase-traces-%{+YYYY.MM.dd}"
 }
}

Performance Impact

Overhead Measurements

ConfigurationCPU OverheadLatency Impact
Text format, INFO level< 1%< 5ms p99
JSON format, INFO level< 2%< 8ms p99
JSON + OTLP export, 5% sampling< 3%< 10ms p99
JSON + OTLP export, 100% sampling< 8%< 20ms p99

Production Recommendations

  1. Use JSON format for structured log processing
  2. Set appropriate log levels: INFO for production, DEBUG for troubleshooting
  3. Configure OTLP sampling: 1-10% for high-traffic systems
  4. Monitor log volume: JSON logs are ~2x larger than text

Alerting and Monitoring

Key Metrics to Monitor

  1. Query Duration Distribution:
histogram_quantile(0.95, 
rate(angarabase_query_duration_seconds_bucket[5m])
)
  1. Slow Query Count:
increase(angarabase_slow_queries_total[5m])
  1. Tracing Overhead:
rate(angarabase_tracing_events_total[5m])

Alerting Rules

groups:
- name: angarabase_tracing
 rules:
 - alert: HighQueryLatency
 expr: |
 histogram_quantile(0.95, 
 rate(angarabase_query_duration_seconds_bucket[5m])
 ) > 1.0
 for: 2m
 labels:
 severity: warning
 annotations:
 summary: "AngaraBase query latency is high"
 
 - alert: TracingVolumeHigh
 expr: |
 rate(angarabase_tracing_events_total[5m]) > 1000
 for: 5m
 labels:
 severity: info
 annotations:
 summary: "AngaraBase tracing volume is high"

Security Considerations

Sensitive Data in Logs

⚠️ WARNING: Tracing logs may contain SQL queries with sensitive data.

Mitigation strategies:

  1. Query parameter redaction:
[diagnostics]
redact_query_params = true # Replace literals with ?
  1. Log rotation and retention:
# logrotate configuration
/var/log/angarabase.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
}
  1. Access control:
chmod 640 /var/log/angarabase.log
chown angarabase:angarabase-ops /var/log/angarabase.log

OTLP Export Security

# Use TLS for OTLP export
export ANGARABASE_OTLP_ENDPOINT=https://jaeger.internal:14268/api/traces
export ANGARABASE_OTLP_TLS_CERT=/etc/ssl/certs/angarabase.crt
export ANGARABASE_OTLP_TLS_KEY=/etc/ssl/private/angarabase.key

Troubleshooting

Common Issues

1. No Tracing Output

Symptoms: No tracing spans in logs Causes:

  • RUST_LOG not set or too restrictive
  • tracing_format misconfigured
  • Tracing not enabled in binary

Solution:

# Check tracing is compiled in
./angarabase-server --version | grep tracing

# Enable debug tracing
RUST_LOG=angarabase=debug ./angarabase-server

2. Missing Span Context

Symptoms: Broken trace chains, missing parent-child relationships Causes:

  • Missing span.enter() in spawn_blocking calls
  • Incorrect async context propagation

Solution: Check for proper span propagation pattern in code

3. High Log Volume

Symptoms: Disk space issues, performance degradation Causes:

  • Log level too verbose (TRACE in production)
  • No log rotation
  • High query volume

Solution:

# Reduce log level
export RUST_LOG=angarabase=info

# Enable log rotation
logrotate -f /etc/logrotate.d/angarabase

4. OTLP Export Failures

Symptoms: Traces not appearing in Jaeger/Tempo Causes:

  • Network connectivity issues
  • Incorrect endpoint configuration
  • Authentication failures

Solution:

# Test OTLP endpoint
curl -v $ANGARABASE_OTLP_ENDPOINT/health

# Check AngaraBase logs for export errors
grep "otlp.*error" /var/log/angarabase.log

Advanced Topics

Custom Span Attributes

AngaraBase automatically adds these attributes to spans:

AttributeDescriptionExample
session_idDatabase session ID12345
query_fingerprintSQL query hashabc123def
plan_hashExecution plan hashdef456ghi
estimated_rowsQuery planner estimate1000
actual_rowsActual rows returned987
wait_event_typeCurrent wait typeLock, IO, Net
wait_eventSpecific wait eventRowLock, PageRead

Correlation with System Metrics

CPU correlation:

# Find high-CPU queries
jq 'select(.span.name == "execute" and .fields.cpu_time_ms > 1000)' \
 /var/log/angarabase.log

I/O correlation:

# Find I/O-heavy queries 
jq 'select(.fields.io_reads > 1000 or .fields.io_writes > 100)' \
 /var/log/angarabase.log

Custom Dashboards

Grafana query examples:

  1. Query throughput by operation:
sum(rate(angarabase_queries_total[5m])) by (operation)
  1. Average query duration by phase:
avg(angarabase_query_phase_duration_seconds) by (phase)
  1. Lock contention rate:
sum(rate(angarabase_wait_events_total{type="Lock"}[5m]))

References

  • RFC-2026-360: Structured Logging and Tracing v0
  • RFC-2026-461: Async Runtime Migration Strategy v0
  • CODING_STANDARDS.md: Tracing guidelines (§9)
  • ASYNC_GUIDELINES.md: Span propagation patterns
  • Tracing crate docs: https://docs.rs/tracing/
  • OpenTelemetry spec: https://opentelemetry.io/docs/

Contact: Database SRE team Last updated: 2026-03-12 by current implementation phase