Automated Scheduling

Automatically execute continuous queries and retention policies on configurable schedules. Eliminate manual data lifecycle management and build efficient data pipelines.

Overview

Arc OSS provides continuous queries and retention policies with manual API-triggered execution. Arc Enterprise adds automatic scheduling — define your schedules once, and Arc handles execution automatically.

Two schedulers:

Scheduler	Purpose	Default Schedule
CQ Scheduler	Runs continuous queries at their configured intervals	Per-CQ interval
Retention Scheduler	Enforces retention policies on a cron schedule	Daily at 3am (`0 3 * * *`)

CQ Scheduler

The CQ Scheduler automatically executes continuous queries at their configured intervals. Each continuous query runs independently on its own schedule.

How It Works

Define continuous queries with intervals via the CQ API
Enable the CQ scheduler (requires enterprise license)
Arc automatically executes each CQ at its configured interval
Results are written to the destination measurement

Configuration

The CQ Scheduler is enabled when continuous queries are enabled and a valid enterprise license is present:

[continuous_query]
enabled = true

ARC_CONTINUOUS_QUERY_ENABLED=true

Each continuous query defines its own execution interval when created through the API.

Retention Scheduler

The Retention Scheduler automatically enforces retention policies on a cron schedule, deleting data that has exceeded its retention period.

How It Works

Define retention policies via the Retention API
Enable the retention scheduler (requires enterprise license)
Arc evaluates all active policies on the configured schedule
Expired data is automatically deleted

Configuration

[retention]
enabled = true

[scheduler]
retention_schedule = "0 3 * * *"   # Cron: daily at 3am

ARC_RETENTION_ENABLED=true
ARC_SCHEDULER_RETENTION_SCHEDULE="0 3 * * *"

Cron Schedule Syntax

The schedule uses standard 5-field cron syntax: minute hour day-of-month month day-of-week.

Schedule	Meaning
`0 3 * * *`	Daily at 3:00 AM
`0 /6 * *`	Every 6 hours
`0 2 * * 0`	Weekly on Sunday at 2:00 AM
`30 1 1 * *`	Monthly on the 1st at 1:30 AM

Data Lifecycle Pipeline

Combine CQ and retention scheduling to build a complete data lifecycle pipeline:

Raw Data (1-second resolution)
    │
    │ CQ: 1-minute aggregation (runs every minute)
    ▼
1-Minute Data
    │
    │ CQ: 1-hour aggregation (runs every hour)
    ▼
1-Hour Data
    │
    │ CQ: 1-day aggregation (runs daily)
    ▼
1-Day Data

Retention Schedule (runs daily at 3am):
  ├── Delete raw data older than 7 days
  ├── Delete 1-minute data older than 30 days
  ├── Delete 1-hour data older than 365 days
  └── Keep 1-day data indefinitely

Example Setup

1. Create continuous queries for downsampling:

# 1-minute aggregation
curl -X POST http://localhost:8000/api/v1/continuous-queries \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "downsample_1min",
    "database": "telemetry",
    "source_measurement": "sensors_raw",
    "destination_measurement": "sensors_1min",
    "query": "SELECT time_bucket('\''1 minute'\'', timestamp) as timestamp, device_id, AVG(temperature) as temperature, MAX(pressure) as pressure FROM sensors_raw WHERE timestamp >= $start AND timestamp < $end GROUP BY 1, 2",
    "interval": "1m",
    "enabled": true
  }'

# 1-hour aggregation
curl -X POST http://localhost:8000/api/v1/continuous-queries \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "downsample_1hr",
    "database": "telemetry",
    "source_measurement": "sensors_1min",
    "destination_measurement": "sensors_1hr",
    "query": "SELECT time_bucket('\''1 hour'\'', timestamp) as timestamp, device_id, AVG(temperature) as temperature, MAX(pressure) as pressure FROM sensors_1min WHERE timestamp >= $start AND timestamp < $end GROUP BY 1, 2",
    "interval": "1h",
    "enabled": true
  }'

2. Create retention policies:

# Delete raw data after 7 days
curl -X POST http://localhost:8000/api/v1/retention \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "raw_7d",
    "database": "telemetry",
    "measurement": "sensors_raw",
    "retention_days": 7,
    "enabled": true
  }'

# Delete 1-minute data after 30 days
curl -X POST http://localhost:8000/api/v1/retention \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "1min_30d",
    "database": "telemetry",
    "measurement": "sensors_1min",
    "retention_days": 30,
    "enabled": true
  }'

With enterprise scheduling enabled, these queries and policies run automatically — no cron jobs, no external orchestration.

Best Practices

Schedule retention during off-peak hours — File deletion generates I/O. The default 3am schedule avoids impacting daytime workloads.
Add buffer days to retention policies — Use the buffer_days parameter in retention policies to provide a safety margin before deletion.
Test CQ queries manually first — Before enabling automatic execution, run your continuous query SQL manually to verify correct results.
Combine with tiered storage — Use tiered storage to move data to cold storage before retention deletes it, keeping long-term archives at low cost.
Monitor CQ execution — Check Arc logs for CQ execution results and errors. Failed CQ executions are logged at WARN level.

Next Steps

Continuous Queries — Create and manage continuous queries (OSS docs)
Retention Policies — Create and manage retention policies (OSS docs)
Tiered Storage — Combine scheduling with tiered storage for optimal cost management

Overview​

CQ Scheduler​

How It Works​

Configuration​

Retention Scheduler​

How It Works​

Configuration​

Data Lifecycle Pipeline​

Example Setup​

Best Practices​

Next Steps​

Overview

CQ Scheduler

How It Works

Configuration

Retention Scheduler

How It Works

Configuration

Data Lifecycle Pipeline

Example Setup

Best Practices

Next Steps