Skip to main content

Performance Benchmarks

Arc delivers industry-leading performance for analytical workloads. This page documents our benchmark methodology and results.

Summary

MetricResult
Ingestion (MessagePack)18M+ records/sec
Query (Arrow)6M+ rows/sec
Query (JSON)2.23M rows/sec
Line Protocol1.92M records/sec

Test Hardware

AMD Ryzen 9 5950X Workstation

  • CPU: AMD Ryzen 9 5950X (16 cores, 32 threads)
  • RAM: 64 GB DDR4
  • Storage: NVMe SSD

Ingestion Throughput

Arc achieves 18M+ records/second on a single node using columnar MessagePack format.

Test Environment

  • CPU: AMD Ryzen 9 5950X (16 cores, 32 threads)
  • RAM: 64 GB DDR4
  • Storage: NVMe SSD
  • Batch size: 10,000 records

Results

  • Peak throughput: 18M+ records/sec
  • Sustained throughput: 15M+ records/sec
  • Write latency p50: 2ms
  • Write latency p99: Under 10ms

The fastest ingestion method, using binary MessagePack with columnar data layout.

Line Protocol

InfluxDB-compatible text protocol, suitable for existing tooling.

MetricValue
Throughput1.92M records/sec
p50 Latency49.53ms
p99 Latency108.53ms

Protocol Comparison

ProtocolThroughputRelative Speed
MessagePack Columnar18M+ rec/s100% (baseline)
Line Protocol1.92M rec/s10%

Why MessagePack is faster:

  • Binary format (no parsing overhead)
  • Columnar layout matches Parquet storage
  • Native gzip compression support
  • Batch-optimized for high throughput

Query Throughput

Arc delivers 6M+ rows/second for analytical queries.

Test Query

SELECT
time_bucket(INTERVAL '1 minute', time) AS bucket,
AVG(value) AS avg_value
FROM prod.metrics
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY bucket
ORDER BY bucket DESC;

Results

  • Throughput: 6M+ rows/sec
  • Latency (100K rows): Under 50ms
  • Latency (1M rows): Under 200ms

Arrow IPC vs JSON

FormatThroughputResponse Size (50K rows)
Arrow IPC6M+ rows/s1.71 MB
JSON2.23M rows/s2.41 MB

Arrow advantages:

  • Zero-copy conversion to Pandas/Polars
  • 29% smaller response payload
  • Native columnar format
  • Ideal for large result sets (10K+ rows)

Go vs Python Implementation

Arc was rewritten from Python to Go, delivering significant improvements:

MetricGoPythonImprovement
Ingestion18M+ rec/s4.21M rec/s+342%
Memory StabilityStable372MB leak/500 queriesFixed
DeploymentSingle binaryMulti-worker processesSimpler
Cold Start<100ms2-3 seconds20x faster

Why Go is Faster

  1. Stable Memory: Go's GC returns memory to OS. Python leaked memory under sustained load.
  2. Native Concurrency: Goroutines handle thousands of connections with minimal overhead.
  3. Single Binary: No Python interpreter or dependency management.
  4. Production GC: Sub-millisecond pause times at scale.

ClickBench Results

Industry-standard analytical query benchmark on the hits dataset.

Test Environment:

  • Instance: AWS c6a.4xlarge (16 vCPU, 32GB RAM)
  • Dataset: 100M rows (14GB Parquet)
  • Queries: 43 analytical queries
RunTotal TimeQueries
Cold (cache flushed)120.25s43
Warm35.70s43

Comparison with Other Databases

DatabaseWarm RunRelative Speed
Arc35.70s1.00x (baseline)
QuestDB64.26s1.80x slower
TimescaleDB335.22s9.39x slower

Arc is 1.80x faster than QuestDB and 9.39x faster than TimescaleDB in analytical workloads.

Detailed Query Performance

All 43 analytical queries completed successfully:

QueryRun 1 (Cold)Run 2Run 3 (Best)Speedup
Q00.0656s0.0493s0.0372s1.76x
Q10.0788s0.0593s0.0628s1.25x
Q20.1617s0.1006s0.0838s1.93x
Q30.3933s0.1135s0.0866s4.54x
Q41.0929s0.3696s0.3703s2.95x
...............

Why Arc is Fast

1. DuckDB Query Engine

Arc leverages DuckDB's columnar execution engine:

  • Vectorized execution: Process thousands of values per CPU instruction
  • Parallel query execution: Utilize all CPU cores automatically
  • Advanced optimizations: Join reordering, predicate pushdown, filter pushdown
  • SIMD instructions: Use modern CPU features (AVX2, AVX-512)

2. Parquet Columnar Storage

  • Columnar format: Read only columns needed for queries
  • Compression: 80% smaller than raw data (Snappy/ZSTD)
  • Predicate pushdown: Skip entire row groups based on statistics
  • Efficient scans: DuckDB reads Parquet natively

3. Go Runtime Efficiency

  • Stable memory: Go's GC returns memory to OS
  • Native concurrency: Goroutines handle thousands of connections
  • Single binary: No interpreter overhead
  • Sub-ms GC pauses: Production-ready garbage collection

Performance Tips

Maximize Ingestion Throughput

  1. Use MessagePack columnar format

    data = {"m": "cpu", "columns": {...}}  # 18M+ rec/s
    # vs
    data = "cpu,host=x value=1" # 1.92M rec/s
  2. Batch your writes (10,000+ records per request)

  3. Enable gzip compression for network efficiency

  4. Use multiple workers (35 optimal for M3 Max)

Maximize Query Throughput

  1. Use Arrow format for large result sets (10K+ rows)

    response = requests.post(url + "/api/v1/query/arrow", ...)
  2. Enable compaction for query optimization

    [compaction]
    enabled = true
  3. Use time-range filters (partition pruning)

    WHERE time > now() - INTERVAL '1 hour'

Scaling Characteristics

Vertical Scaling

  • CPU: Near-linear scaling with core count
  • Memory: Auto-configured to ~50% system RAM

Storage Backend Impact

BackendWrite OverheadQuery Overhead
Local NVMeBaselineBaseline
MinIO (local)+5-10%+2-5%
AWS S3+20-30%+10-20%

Reproducibility

Run benchmarks locally:

git clone https://github.com/basekick-labs/arc.git
cd arc
make bench

Next Steps