Configuration Overview
Arc uses a centralized arc.conf
configuration file (TOML format) with environment variable overrides for flexibility.
Configuration Files
Primary: arc.conf
The main configuration file with production-ready defaults:
# Server Configuration
[server]
host = "0.0.0.0"
port = 8000
workers = 8
# Authentication
[auth]
enabled = true
default_token = "" # Auto-generated if empty
# Query Cache
[query_cache]
enabled = true
ttl_seconds = 60
# Storage Backend
[storage]
backend = "local" # Options: local, minio, s3, gcs
[storage.local]
base_path = "./data/arc"
database = "default"
Environment Variables
Override any setting via environment variables:
# Server
ARC_HOST=0.0.0.0
ARC_PORT=8000
ARC_WORKERS=42
# Storage
STORAGE_BACKEND=minio
MINIO_ENDPOINT=localhost:9000
MINIO_BUCKET=arc
# Cache
QUERY_CACHE_ENABLED=true
QUERY_CACHE_TTL=60
Legacy: .env File
Backward compatible with .env
files, but arc.conf
is recommended.
Configuration Priority
Settings are applied in this order (highest to lowest):
- Environment variables (e.g.,
ARC_WORKERS=16
) - arc.conf file
- Built-in defaults
Key Configuration Areas
Server
- Workers: Set to 3x CPU cores for optimal performance
- Host/Port: Network binding configuration
- Logging: Log level and output format
Storage
- Backend selection: Local, MinIO, AWS S3, GCS
- Connection settings: Endpoints, credentials, buckets
- Database namespaces: Multi-tenant organization
Authentication
- API tokens: Create and manage access tokens
- Permissions: Read/write access control
- Token rotation: Security best practices
Performance
- Buffer sizes: Ingestion throughput tuning
- Query cache: Speed up repeated queries
- Compaction: Optimize query performance
Advanced Features
- Write-Ahead Log (WAL): Zero data loss guarantee
- Compaction: Automatic file optimization
- Monitoring: Metrics and health checks
Quick Configuration Examples
Maximum Performance (Native)
[server]
workers = 42 # 3x CPU cores (14 cores × 3)
[storage]
backend = "local"
[storage.local]
base_path = "/mnt/nvme/arc-data" # NVMe storage
[ingestion]
buffer_size = 200000
buffer_age_seconds = 10
[compaction]
enabled = true
schedule = "5 * * * *"
Distributed Deployment
[server]
workers = 32
[storage]
backend = "minio"
[storage.minio]
endpoint = "http://minio-cluster:9000"
access_key = "minioadmin"
secret_key = "minioadmin123"
bucket = "arc"
use_ssl = false
[compaction]
enabled = true
max_concurrent_jobs = 4
Cloud Deployment (AWS S3)
[server]
workers = 16
[storage]
backend = "s3"
[storage.s3]
bucket = "arc-production"
region = "us-east-1"
# Uses IAM role or AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env vars
[query_cache]
enabled = true
ttl_seconds = 300
[compaction]
enabled = true
compression = "zstd"
compression_level = 3
High Durability (WAL Enabled)
[server]
workers = 24
[wal]
enabled = true
sync_mode = "fdatasync"
dir = "./data/wal"
max_size_mb = 500
max_age_seconds = 3600
[storage]
backend = "minio"
[storage.minio]
endpoint = "http://localhost:9000"
bucket = "arc"
Configuration Validation
Arc validates configuration on startup:
# Test configuration
./start.sh native --validate
# Or manually
python3 -c "from api.config import load_config; load_config()"
Environment-Specific Configurations
Development
[server]
workers = 4
port = 8000
[storage]
backend = "local"
[storage.local]
base_path = "./dev_data"
[auth]
enabled = false # Disable for local development
[query_cache]
enabled = false # See fresh queries
Staging
[server]
workers = 16
[storage]
backend = "minio"
[storage.minio]
bucket = "arc-staging"
database = "staging"
[auth]
enabled = true
[compaction]
enabled = true
schedule = "0 */2 * * *" # Every 2 hours
Production
[server]
workers = 42
[storage]
backend = "s3"
[storage.s3]
bucket = "arc-production"
region = "us-east-1"
database = "production"
[auth]
enabled = true
[wal]
enabled = true
sync_mode = "fdatasync"
[compaction]
enabled = true
schedule = "5 * * * *" # Every hour at :05
max_concurrent_jobs = 4
[query_cache]
enabled = true
ttl_seconds = 300
Best Practices
1. Use arc.conf for Permanent Settings
Store configuration in arc.conf
and version control it (without secrets):
[storage.minio]
endpoint = "http://minio:9000"
bucket = "arc"
# Access keys via environment variables
2. Use Environment Variables for Secrets
export MINIO_ACCESS_KEY="your_access_key"
export MINIO_SECRET_KEY="your_secret_key"
export ARC_ADMIN_TOKEN="your_admin_token"
3. Set Workers Based on Workload
- Light (4 workers): Development, testing
- Medium (8-16 workers): Small production workloads
- Heavy (24-42 workers): High-throughput production
4. Enable Features Progressively
Start simple, add features as needed:
- Basic configuration (storage + auth)
- Query caching (for repeated queries)
- Compaction (for query optimization)
- WAL (for zero data loss)
5. Monitor Configuration Impact
Check metrics after configuration changes:
# Memory usage
curl http://localhost:8000/metrics/memory
# Query performance
curl http://localhost:8000/metrics/query-pool
# Compaction status
curl http://localhost:8000/api/compaction/status
Troubleshooting
Configuration Not Loading
# Check syntax
python3 -c "import toml; toml.load('arc.conf')"
# Verify file location
ls -la arc.conf
# Check permissions
chmod 644 arc.conf
Environment Variables Not Working
# Verify they're set
env | grep ARC_
env | grep STORAGE_
env | grep MINIO_
# Export before starting Arc
export ARC_WORKERS=16
./start.sh native
Performance Issues
# Check worker count
ps aux | grep uvicorn | wc -l
# Should be 3x CPU cores for optimal performance
nproc # Get CPU count