Skip to main content

OpenTelemetry Integration

Send traces, metrics, and logs from OpenTelemetry Collector to Arc for unified observability.

Overview

The Arc OpenTelemetry Exporter enables you to send all your telemetry data from the OpenTelemetry Collector to Arc:

  • ✅ Traces: Distributed traces with full span hierarchy
  • ✅ Metrics: All metric types (gauge, counter, histogram, summary)
  • ✅ Logs: Structured logs with attributes
  • 🚀 High Performance: Uses Arc's columnar MessagePack format
  • 📦 Compression: Automatic gzip compression
  • 🔄 Retry Logic: Configurable retry with exponential backoff
  • 🔐 Authentication: Bearer token support

Performance:

  • Traces: 500K-1M spans/sec
  • Metrics: 3M-6M data points/sec
  • Logs: 1M-2M logs/sec

Why OpenTelemetry + Arc?

Traditional observability requires 3+ separate systems:

  • Jaeger for traces
  • Prometheus for metrics
  • Loki/Elasticsearch for logs
  • Manual correlation between systems

With Arc + OpenTelemetry:

  • ✅ All signals in one database
  • ✅ Join traces, metrics, and logs in SQL
  • ✅ No manual correlation needed
  • ✅ Single query for complete context
  • ✅ One storage backend to manage

This is unified observability.

Installation

Option 1: OpenTelemetry Collector Builder (OCB)

Add to your builder-config.yaml:

exporters:
- gomod: github.com/basekick-labs/arc-opentelemetry-exporter v0.1.0

Build the collector:

ocb --config builder-config.yaml

Option 2: Pre-built Binary

Download from the releases page:

# Download latest release
wget https://github.com/basekick-labs/arc-opentelemetry-exporter/releases/download/v0.1.0/otelcol-arc-linux-amd64

# Make executable
chmod +x otelcol-arc-linux-amd64

# Run
./otelcol-arc-linux-amd64 --config=config.yaml

Quick Start

1. Start Arc

docker run -d -p 8000:8000 \
-e STORAGE_BACKEND=local \
-v arc-data:/app/data \
ghcr.io/basekick-labs/arc:25.11.1

2. Get Your API Token

# Check logs for admin token
docker logs <container-id> 2>&1 | grep "Admin token"

# Or create a new token
curl -X POST http://localhost:8000/api/v1/auth/tokens \
-H "Authorization: Bearer $ARC_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "otel-collector",
"description": "OpenTelemetry Collector access"
}'

3. Create Collector Configuration

Create otel-config.yaml:

receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318

processors:
batch:
timeout: 1s
send_batch_size: 1000

exporters:
arc:
endpoint: http://localhost:8000
auth_token: your-arc-token-here

# Recommended: Separate databases per signal type
traces_database: traces
metrics_database: metrics
logs_database: logs

# Optional: Custom measurement names
traces_measurement: distributed_traces
logs_measurement: logs

# Optional: HTTP settings
timeout: 30s
compression: gzip

# Optional: Retry configuration
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s

service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [arc]

metrics:
receivers: [otlp]
processors: [batch]
exporters: [arc]

logs:
receivers: [otlp]
processors: [batch]
exporters: [arc]

4. Run OpenTelemetry Collector

./otelcol-arc-linux-amd64 --config=otel-config.yaml

5. Send Telemetry Data

Your applications instrumented with OpenTelemetry SDKs will now send data to Arc!

Example: Python Application

from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

# Configure tracer
trace.set_tracer_provider(TracerProvider())
otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)
span_processor = BatchSpanProcessor(otlp_exporter)
trace.get_tracer_provider().add_span_processor(span_processor)

# Create spans
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("my-operation"):
# Your code here
print("Trace sent to Arc via OTel Collector!")

Configuration

Database Organization Strategies

Strategy 1: Single Database (Simple)

All signals in one database:

exporters:
arc:
endpoint: http://localhost:8000
database: default

Structure:

default/
├── distributed_traces
├── logs
├── system_cpu_usage
└── http_requests_total

Pros: Simple, easy cross-signal correlation Cons: All data in one namespace

Separate databases for each signal type:

exporters:
arc:
endpoint: http://localhost:8000
traces_database: traces
metrics_database: metrics
logs_database: logs

Structure:

traces/
└── distributed_traces

metrics/
├── system_cpu_usage
├── system_memory_usage
└── http_requests_total

logs/
└── logs

Pros:

  • Clean separation of concerns
  • Independent retention policies per signal
  • Independent scaling and storage backends
  • Easier permission management
  • Matches traditional observability architecture

Cons: Slightly more complex configuration

✅ Recommended for production deployments.

Configuration Options

OptionDescriptionDefault
endpointArc API endpointRequired
auth_tokenAuthentication tokenOptional
databaseSingle database for all signalsdefault
traces_databaseDatabase for traces-
metrics_databaseDatabase for metrics-
logs_databaseDatabase for logs-
traces_measurementTable name for tracesdistributed_traces
logs_measurementTable name for logslogs
timeoutHTTP request timeout30s
compressionCompression typegzip
retry_on_failure.enabledEnable retry logictrue
retry_on_failure.initial_intervalInitial retry interval5s
retry_on_failure.max_intervalMaximum retry interval30s
retry_on_failure.max_elapsed_timeMaximum total retry time300s

Data Format

The exporter uses Arc's high-performance columnar MessagePack format with dynamic columns. All OpenTelemetry attributes automatically become individual columns for optimal query performance.

Traces

All span attributes and resource attributes become columns:

{
"m": "distributed_traces",
"columns": {
"time": [1699900000000],
"trace_id": ["5b8efff798038103d269b633813fc60c"],
"span_id": ["def456..."],
"parent_span_id": ["ghi789..."],
"service_name": ["api-gateway"],
"operation_name": ["HTTP GET /users"],
"span_kind": ["server"],
"duration_ns": [1234567],
"status_code": [0],
"http.method": ["GET"],
"http.status_code": [200],
"http.url": ["/api/users"],
"host.name": ["server-1"]
}
}

Dynamic schema: Columns created automatically from span and resource attributes.

Metrics

Each metric name becomes its own table. All attributes become columns:

{
"m": "http_requests_total",
"columns": {
"time": [1699900000000],
"value": [42.0],
"service": ["api"],
"method": ["GET"],
"status": ["200"],
"host.name": ["server-1"]
}
}

Metric name sanitization:

  • system.cpu.usagesystem_cpu_usage
  • http.server.durationhttp_server_duration
  • process-memory-bytesprocess_memory_bytes

Logs

All log attributes and resource attributes become columns:

{
"m": "logs",
"columns": {
"time": [1699900000000],
"severity": ["ERROR"],
"severity_number": [17],
"body": ["Database connection failed"],
"trace_id": ["abc123..."],
"span_id": ["def456..."],
"service_name": ["api-gateway"],
"http.method": ["POST"],
"user_id": ["12345"],
"host.name": ["server-1"]
}
}

Querying Data

Traces

-- Recent traces
SELECT * FROM traces.distributed_traces
WHERE time > NOW() - INTERVAL '1 hour'
LIMIT 100;

-- Traces by service
SELECT
service_name,
operation_name,
duration_ns / 1000000 AS duration_ms,
"http.method",
"http.status_code"
FROM traces.distributed_traces
WHERE service_name = 'api-gateway'
AND time > NOW() - INTERVAL '1 hour'
ORDER BY time DESC;

-- Debug specific trace
SELECT * FROM traces.distributed_traces
WHERE trace_id = '5b8efff798038103d269b633813fc60c'
ORDER BY time;

-- Slow requests (p99 latency)
SELECT
service_name,
operation_name,
PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY duration_ns) / 1000000 AS p99_ms
FROM traces.distributed_traces
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY service_name, operation_name
ORDER BY p99_ms DESC;

Metrics

Each metric is its own table with all attributes as columns:

-- CPU usage
SELECT
time,
value,
"host.name",
cpu,
state
FROM metrics.system_cpu_usage
WHERE time > NOW() - INTERVAL '1 hour'
AND "host.name" = 'server-1'
ORDER BY time DESC;

-- HTTP requests by method and status
SELECT
time_bucket(INTERVAL '1 minute', time) AS minute,
method,
status,
SUM(value) AS total_requests
FROM metrics.http_requests_total
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY minute, method, status
ORDER BY minute DESC;

-- Memory usage aggregated
SELECT
time_bucket(INTERVAL '5 minutes', time) AS bucket,
"host.name",
AVG(value) AS avg_memory_bytes
FROM metrics.system_memory_usage
WHERE time > NOW() - INTERVAL '6 hours'
GROUP BY bucket, "host.name"
ORDER BY bucket DESC;

Logs

All attributes are individual columns for fast filtering:

-- Recent error logs
SELECT
time,
severity,
body,
service_name,
"host.name",
trace_id
FROM logs.logs
WHERE severity IN ('ERROR', 'FATAL')
AND time > NOW() - INTERVAL '1 hour'
ORDER BY time DESC;

-- Logs for specific trace (correlation)
SELECT
time,
severity,
body,
service_name
FROM logs.logs
WHERE trace_id = '5b8efff798038103d269b633813fc60c'
ORDER BY time;

-- Count errors by service
SELECT
service_name,
"host.name",
COUNT(*) AS error_count
FROM logs.logs
WHERE severity IN ('ERROR', 'FATAL')
AND time > NOW() - INTERVAL '1 hour'
GROUP BY service_name, "host.name"
ORDER BY error_count DESC;

Unified Observability: Join Across Signals

Arc's most powerful feature: correlate traces, metrics, and logs in a single SQL query.

Example 1: Failed Requests with Full Context

Get traces, error logs, and CPU metrics for failed requests:

SELECT
t.time,
t.trace_id,
t.service_name,
t.operation_name,
t.duration_ns / 1000000 AS duration_ms,
t."http.status_code",
l.severity,
l.body AS error_message,
cpu.value AS cpu_usage
FROM traces.distributed_traces t
LEFT JOIN logs.logs l
ON t.trace_id = l.trace_id
LEFT JOIN metrics.system_cpu_usage cpu
ON t.service_name = cpu.service_name
AND time_bucket(INTERVAL '1 minute', t.time) = time_bucket(INTERVAL '1 minute', cpu.time)
WHERE t.status_code >= 2 -- OTel status: 2 = Error
AND t.time > NOW() - INTERVAL '1 hour'
ORDER BY t.time DESC
LIMIT 100;

Result: Traces + error logs + CPU usage at time of failure — all in one query!

Example 2: Service Health Dashboard

Complete service health metrics:

WITH trace_stats AS (
SELECT
time_bucket(INTERVAL '5 minutes', time) AS bucket,
service_name,
COUNT(*) AS request_count,
AVG(duration_ns / 1000000) AS avg_latency_ms,
SUM(CASE WHEN status_code >= 2 THEN 1 ELSE 0 END) AS error_count
FROM traces.distributed_traces
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY bucket, service_name
),
error_logs AS (
SELECT
time_bucket(INTERVAL '5 minutes', time) AS bucket,
service_name,
COUNT(*) AS log_error_count
FROM logs.logs
WHERE severity IN ('ERROR', 'FATAL')
AND time > NOW() - INTERVAL '1 hour'
GROUP BY bucket, service_name
),
cpu_stats AS (
SELECT
time_bucket(INTERVAL '5 minutes', time) AS bucket,
service_name,
AVG(value) AS avg_cpu
FROM metrics.system_cpu_usage
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY bucket, service_name
)
SELECT
ts.bucket AS time,
ts.service_name,
ts.request_count,
ROUND(ts.avg_latency_ms, 2) AS avg_latency_ms,
ts.error_count,
ROUND((ts.error_count::float / NULLIF(ts.request_count, 0) * 100), 2) AS error_rate_pct,
el.log_error_count,
ROUND(cs.avg_cpu, 2) AS avg_cpu_usage
FROM trace_stats ts
LEFT JOIN error_logs el ON ts.bucket = el.bucket AND ts.service_name = el.service_name
LEFT JOIN cpu_stats cs ON ts.bucket = cs.bucket AND ts.service_name = cs.service_name
ORDER BY ts.bucket DESC, ts.service_name;

Result:

  • Request volume and latency (traces)
  • Error rate (traces)
  • Error log count (logs)
  • CPU usage (metrics)

All from one database, in one query!

Example 3: Debug Incident Timeline

Unified timeline of traces and logs for a single request:

SELECT
t.time,
'trace' AS signal_type,
t.operation_name AS event,
t.duration_ns / 1000000 AS duration_ms,
t.status_code,
NULL AS severity,
NULL AS body
FROM traces.distributed_traces t
WHERE t.trace_id = '5b8efff798038103d269b633813fc60c'

UNION ALL

SELECT
l.time,
'log' AS signal_type,
l.service_name AS event,
NULL AS duration_ms,
NULL AS status_code,
l.severity,
l.body
FROM logs.logs l
WHERE l.trace_id = '5b8efff798038103d269b633813fc60c'

ORDER BY time;

Result: Complete chronological view of all events for a single request!

Use Cases

Application Performance Monitoring

Monitor service latency, error rates, and throughput:

SELECT
time_bucket(INTERVAL '1 minute', time) AS minute,
service_name,
COUNT(*) AS requests,
AVG(duration_ns / 1000000) AS avg_latency_ms,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY duration_ns) / 1000000 AS p95_ms,
SUM(CASE WHEN status_code >= 2 THEN 1 ELSE 0 END) AS errors
FROM traces.distributed_traces
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY minute, service_name
ORDER BY minute DESC;

Distributed Tracing

Analyze request flows across microservices:

-- Trace all spans in a distributed transaction
SELECT
span_id,
parent_span_id,
service_name,
operation_name,
duration_ns / 1000000 AS duration_ms,
"http.method",
"http.url"
FROM traces.distributed_traces
WHERE trace_id = 'your-trace-id'
ORDER BY time;

Log Analysis

Search and analyze structured logs:

-- Find all errors from a specific user session
SELECT
time,
severity,
body,
service_name,
user_id,
"http.method",
"http.url"
FROM logs.logs
WHERE user_id = '12345'
AND severity IN ('ERROR', 'WARN')
AND time > NOW() - INTERVAL '24 hours'
ORDER BY time DESC;

Performance Optimization

1. Use Batch Processor

Always use the batch processor for high throughput:

processors:
batch:
timeout: 1s
send_batch_size: 1000 # Adjust based on your load

2. Configure Retry Logic

Handle transient failures:

exporters:
arc:
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s

3. Enable Compression

Reduce network bandwidth:

exporters:
arc:
compression: gzip # Default and recommended

4. Tune Collector Resources

For high-volume deployments:

service:
telemetry:
metrics:
level: detailed
extensions: [health_check]

5. Use Separate Databases

For production, use database-per-signal strategy:

exporters:
arc:
traces_database: traces
metrics_database: metrics
logs_database: logs

Troubleshooting

Collector Not Sending Data

# Check collector logs
./otelcol-arc-linux-amd64 --config=config.yaml

# Verify Arc is accessible
curl http://localhost:8000/health

# Test authentication
curl -H "Authorization: Bearer YOUR_TOKEN" \
http://localhost:8000/api/v1/auth/verify

High Memory Usage

Reduce batch size:

processors:
batch:
timeout: 1s
send_batch_size: 500 # Reduce from 1000

Data Not Appearing in Arc

-- Check if data is being written
SHOW TABLES FROM traces;
SHOW TABLES FROM metrics;
SHOW TABLES FROM logs;

-- Verify recent data
SELECT COUNT(*) FROM traces.distributed_traces
WHERE time > NOW() - INTERVAL '5 minutes';

Connection Timeouts

Increase timeout:

exporters:
arc:
timeout: 60s # Increase from 30s

Resources

Next Steps


Ready for unified observability!

Made with ❤️ by Basekick Labs