AngaraBase Tracing Operations Guide
Audience: Database operators, SREs, performance engineers
Overview
AngaraBase включает structured tracing на базе tracing crate с поддержкой OpenTelemetry. Эта система
заменяет legacy env_logger и обеспечивает end-to-end visibility для query execution pipeline.
Key Benefits:
- End-to-end spans:
accept → authenticate → parse → plan → execute → storage → response - Каждый запрос имеет уникальный
trace_id - JSON-структурированные логи для machine parsing
- Интеграция с Jaeger/Tempo через OpenTelemetry
- Автоматическая propagation tracing context через async/sync boundaries
Configuration
Basic Setup
В angarabase.conf:
[diagnostics]
# Tracing output format: "text" (human-readable) или "json" (structured)
tracing_format = "json"
# Log level filtering (same as RUST_LOG)
log_level = "info"
Environment Variables
| Variable | Description | Example |
|---|---|---|
RUST_LOG | Log level filter | RUST_LOG=angarabase=debug,tokio=info |
ANGARABASE_OTLP_ENDPOINT | OpenTelemetry collector endpoint | http://jaeger:14268/api/traces |
ANGARABASE_TRACE_SAMPLE_RATE | Sampling rate (0.0-1.0) | 0.1 (10% sampling) |
JSON vs Text Format
Text format (human-readable):
2026-03-12T10:30:45.123Z INFO angarabase::query::executor: query_start session_id=12345 sql="SELECT * FROM users"
2026-03-12T10:30:45.125Z DEBUG angarabase::query::planner: plan_created plan_hash=abc123 estimated_rows=1000
JSON format (machine-parseable):
{"timestamp":"2026-03-12T10:30:45.123Z","level":"INFO","target":"angarabase::query::executor","fields":{"session_id":12345,"sql":"SELECT * FROM users"},"span":{"name":"query_execution","trace_id":"abc123"}}
Tracing Architecture
Span Hierarchy
query_execution (root span)
├── parse (SQL → AST)
├── plan (AST → execution plan)
├── execute
│ ├── storage_io (page reads/writes)
│ ├── lock_acquisition
│ └── wal_flush
└── commit (autocommit finalization)
Span Propagation
AngaraBase автоматически propagates tracing context через:
- Async boundaries:
tokio::task::spawn_blockingcalls - Thread pool: Worker thread execution
- Network layer: AngaraNet io_uring/tokio integration
Implementation pattern:
#![allow(unused)]
fn main() {
let span = tracing::Span::current();
tokio::task::spawn_blocking(move || {
let _enter = span.enter();
// Sync code here inherits tracing context
engine.execute(&query)
}).await
}
Operational Procedures
Enabling Tracing
- Development/Debug:
RUST_LOG=angarabase=trace ./angarabase-server
- Production (JSON logs):
# angarabase.conf
[diagnostics]
tracing_format = "json"
log_level = "info"
- OpenTelemetry Export:
export ANGARABASE_OTLP_ENDPOINT=http://jaeger:14268/api/traces
export ANGARABASE_TRACE_SAMPLE_RATE=0.05 # 5% sampling
./angarabase-server
Monitoring Query Performance
1. Slow Query Detection
Text logs:
grep "query_execution.*duration_ms" /var/log/angarabase.log | \
awk '$NF > 1000' | head -10 # Queries > 1 second
JSON logs:
jq 'select(.span.name == "query_execution" and .fields.duration_ms > 1000)' \
/var/log/angarabase.log
2. Per-Phase Timing Analysis
Look for spans with names: parse, plan, execute, commit
Example JSON query:
jq 'select(.span.name == "execute" and .fields.duration_ms > 500)' \
/var/log/angarabase.log | \
jq '.fields | {trace_id, duration_ms, estimated_rows}'
3. Lock Contention Detection
# Find queries waiting on locks
jq 'select(.fields.wait_event_type == "Lock")' /var/log/angarabase.log
Troubleshooting Common Issues
High Parse Time
# Find queries with slow parsing
jq 'select(.span.name == "parse" and .fields.duration_ms > 100)' \
/var/log/angarabase.log | jq '.fields.sql'
Common causes:
- Complex SQL with many JOINs
- Large IN clauses
- Deeply nested subqueries
High Plan Time
# Find queries with slow planning
jq 'select(.span.name == "plan" and .fields.duration_ms > 200)' \
/var/log/angarabase.log
Common causes:
- Missing statistics
- Complex join ordering
- Large number of tables
High Execute Time
# Correlate with storage I/O
jq 'select(.span.name == "storage_io" and .fields.duration_ms > 1000)' \
/var/log/angarabase.log
Common causes:
- Sequential scans
- I/O bottlenecks
- Lock contention
Integration with External Tools
Jaeger Integration
- Setup Jaeger:
docker run -d --name jaeger \
-p 14268:14268 -p 16686:16686 \
jaegertracing/all-in-one:latest
- Configure AngaraBase:
export ANGARABASE_OTLP_ENDPOINT=http://localhost:14268/api/traces
- View traces: http://localhost:16686
Grafana/Tempo Integration
# tempo.yaml
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
storage:
trace:
backend: local
local:
path: /tmp/tempo/traces
Log Aggregation (ELK Stack)
Logstash configuration:
input {
file {
path => "/var/log/angarabase.log"
codec => "json"
}
}
filter {
if [span][name] {
mutate {
add_field => { "trace_operation" => "%{[span][name]}" }
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "angarabase-traces-%{+YYYY.MM.dd}"
}
}
Performance Impact
Overhead Measurements
| Configuration | CPU Overhead | Latency Impact |
|---|---|---|
| Text format, INFO level | < 1% | < 5ms p99 |
| JSON format, INFO level | < 2% | < 8ms p99 |
| JSON + OTLP export, 5% sampling | < 3% | < 10ms p99 |
| JSON + OTLP export, 100% sampling | < 8% | < 20ms p99 |
Production Recommendations
- Use JSON format for structured log processing
- Set appropriate log levels:
INFOfor production,DEBUGfor troubleshooting - Configure OTLP sampling: 1-10% for high-traffic systems
- Monitor log volume: JSON logs are ~2x larger than text
Alerting and Monitoring
Key Metrics to Monitor
- Query Duration Distribution:
histogram_quantile(0.95,
rate(angarabase_query_duration_seconds_bucket[5m])
)
- Slow Query Count:
increase(angarabase_slow_queries_total[5m])
- Tracing Overhead:
rate(angarabase_tracing_events_total[5m])
Alerting Rules
groups:
- name: angarabase_tracing
rules:
- alert: HighQueryLatency
expr: |
histogram_quantile(0.95,
rate(angarabase_query_duration_seconds_bucket[5m])
) > 1.0
for: 2m
labels:
severity: warning
annotations:
summary: "AngaraBase query latency is high"
- alert: TracingVolumeHigh
expr: |
rate(angarabase_tracing_events_total[5m]) > 1000
for: 5m
labels:
severity: info
annotations:
summary: "AngaraBase tracing volume is high"
Security Considerations
Sensitive Data in Logs
⚠️ WARNING: Tracing logs may contain SQL queries with sensitive data.
Mitigation strategies:
- Query parameter redaction:
[diagnostics]
redact_query_params = true # Replace literals with ?
- Log rotation and retention:
# logrotate configuration
/var/log/angarabase.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
}
- Access control:
chmod 640 /var/log/angarabase.log
chown angarabase:angarabase-ops /var/log/angarabase.log
OTLP Export Security
# Use TLS for OTLP export
export ANGARABASE_OTLP_ENDPOINT=https://jaeger.internal:14268/api/traces
export ANGARABASE_OTLP_TLS_CERT=/etc/ssl/certs/angarabase.crt
export ANGARABASE_OTLP_TLS_KEY=/etc/ssl/private/angarabase.key
Troubleshooting
Common Issues
1. No Tracing Output
Symptoms: No tracing spans in logs Causes:
RUST_LOGnot set or too restrictivetracing_formatmisconfigured- Tracing not enabled in binary
Solution:
# Check tracing is compiled in
./angarabase-server --version | grep tracing
# Enable debug tracing
RUST_LOG=angarabase=debug ./angarabase-server
2. Missing Span Context
Symptoms: Broken trace chains, missing parent-child relationships Causes:
- Missing
span.enter()inspawn_blockingcalls - Incorrect async context propagation
Solution: Check for proper span propagation pattern in code
3. High Log Volume
Symptoms: Disk space issues, performance degradation Causes:
- Log level too verbose (
TRACEin production) - No log rotation
- High query volume
Solution:
# Reduce log level
export RUST_LOG=angarabase=info
# Enable log rotation
logrotate -f /etc/logrotate.d/angarabase
4. OTLP Export Failures
Symptoms: Traces not appearing in Jaeger/Tempo Causes:
- Network connectivity issues
- Incorrect endpoint configuration
- Authentication failures
Solution:
# Test OTLP endpoint
curl -v $ANGARABASE_OTLP_ENDPOINT/health
# Check AngaraBase logs for export errors
grep "otlp.*error" /var/log/angarabase.log
Advanced Topics
Custom Span Attributes
AngaraBase automatically adds these attributes to spans:
| Attribute | Description | Example |
|---|---|---|
session_id | Database session ID | 12345 |
query_fingerprint | SQL query hash | abc123def |
plan_hash | Execution plan hash | def456ghi |
estimated_rows | Query planner estimate | 1000 |
actual_rows | Actual rows returned | 987 |
wait_event_type | Current wait type | Lock, IO, Net |
wait_event | Specific wait event | RowLock, PageRead |
Correlation with System Metrics
CPU correlation:
# Find high-CPU queries
jq 'select(.span.name == "execute" and .fields.cpu_time_ms > 1000)' \
/var/log/angarabase.log
I/O correlation:
# Find I/O-heavy queries
jq 'select(.fields.io_reads > 1000 or .fields.io_writes > 100)' \
/var/log/angarabase.log
Custom Dashboards
Grafana query examples:
- Query throughput by operation:
sum(rate(angarabase_queries_total[5m])) by (operation)
- Average query duration by phase:
avg(angarabase_query_phase_duration_seconds) by (phase)
- Lock contention rate:
sum(rate(angarabase_wait_events_total{type="Lock"}[5m]))
References
- RFC-2026-360: Structured Logging and Tracing v0
- RFC-2026-461: Async Runtime Migration Strategy v0
- CODING_STANDARDS.md: Tracing guidelines (§9)
- ASYNC_GUIDELINES.md: Span propagation patterns
- Tracing crate docs: https://docs.rs/tracing/
- OpenTelemetry spec: https://opentelemetry.io/docs/
Contact: Database SRE team Last updated: 2026-03-12 by current implementation phase