Automated Scheduling
Automatically execute continuous queries and retention policies on configurable schedules. Eliminate manual data lifecycle management and build efficient data pipelines.
Overview
Arc OSS provides continuous queries and retention policies with manual API-triggered execution. Arc Enterprise adds automatic scheduling — define your schedules once, and Arc handles execution automatically.
Two schedulers:
| Scheduler | Purpose | Default Schedule |
|---|---|---|
| CQ Scheduler | Runs continuous queries at their configured intervals | Per-CQ interval |
| Retention Scheduler | Enforces retention policies on a cron schedule | Daily at 3am (0 3 * * *) |
CQ Scheduler
The CQ Scheduler automatically executes continuous queries at their configured intervals. Each continuous query runs independently on its own schedule.
How It Works
- Define continuous queries with intervals via the CQ API
- Enable the CQ scheduler (requires enterprise license)
- Arc automatically executes each CQ at its configured interval
- Results are written to the destination measurement
Configuration
The CQ Scheduler is enabled when continuous queries are enabled and a valid enterprise license is present:
[continuous_query]
enabled = true
ARC_CONTINUOUS_QUERY_ENABLED=true
Each continuous query defines its own execution interval when created through the API.
Retention Scheduler
The Retention Scheduler automatically enforces retention policies on a cron schedule, deleting data that has exceeded its retention period.
How It Works
- Define retention policies via the Retention API
- Enable the retention scheduler (requires enterprise license)
- Arc evaluates all active policies on the configured schedule
- Expired data is automatically deleted
Configuration
[retention]
enabled = true
[scheduler]
retention_schedule = "0 3 * * *" # Cron: daily at 3am
ARC_RETENTION_ENABLED=true
ARC_SCHEDULER_RETENTION_SCHEDULE="0 3 * * *"
The schedule uses standard 5-field cron syntax: minute hour day-of-month month day-of-week.
| Schedule | Meaning |
|---|---|
0 3 * * * | Daily at 3:00 AM |
0 */6 * * * | Every 6 hours |
0 2 * * 0 | Weekly on Sunday at 2:00 AM |
30 1 1 * * | Monthly on the 1st at 1:30 AM |
Data Lifecycle Pipeline
Combine CQ and retention scheduling to build a complete data lifecycle pipeline:
Raw Data (1-second resolution)
│
│ CQ: 1-minute aggregation (runs every minute)
▼
1-Minute Data
│
│ CQ: 1-hour aggregation (runs every hour)
▼
1-Hour Data
│
│ CQ: 1-day aggregation (runs daily)
▼
1-Day Data
Retention Schedule (runs daily at 3am):
├── Delete raw data older than 7 days
├── Delete 1-minute data older than 30 days
├── Delete 1-hour data older than 365 days
└── Keep 1-day data indefinitely
Example Setup
1. Create continuous queries for downsampling:
# 1-minute aggregation
curl -X POST http://localhost:8000/api/v1/continuous-queries \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "downsample_1min",
"database": "telemetry",
"source_measurement": "sensors_raw",
"destination_measurement": "sensors_1min",
"query": "SELECT time_bucket('\''1 minute'\'', timestamp) as timestamp, device_id, AVG(temperature) as temperature, MAX(pressure) as pressure FROM sensors_raw WHERE timestamp >= $start AND timestamp < $end GROUP BY 1, 2",
"interval": "1m",
"enabled": true
}'
# 1-hour aggregation
curl -X POST http://localhost:8000/api/v1/continuous-queries \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "downsample_1hr",
"database": "telemetry",
"source_measurement": "sensors_1min",
"destination_measurement": "sensors_1hr",
"query": "SELECT time_bucket('\''1 hour'\'', timestamp) as timestamp, device_id, AVG(temperature) as temperature, MAX(pressure) as pressure FROM sensors_1min WHERE timestamp >= $start AND timestamp < $end GROUP BY 1, 2",
"interval": "1h",
"enabled": true
}'
2. Create retention policies:
# Delete raw data after 7 days
curl -X POST http://localhost:8000/api/v1/retention \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "raw_7d",
"database": "telemetry",
"measurement": "sensors_raw",
"retention_days": 7,
"enabled": true
}'
# Delete 1-minute data after 30 days
curl -X POST http://localhost:8000/api/v1/retention \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "1min_30d",
"database": "telemetry",
"measurement": "sensors_1min",
"retention_days": 30,
"enabled": true
}'
With enterprise scheduling enabled, these queries and policies run automatically — no cron jobs, no external orchestration.
Best Practices
-
Schedule retention during off-peak hours — File deletion generates I/O. The default 3am schedule avoids impacting daytime workloads.
-
Add buffer days to retention policies — Use the
buffer_daysparameter in retention policies to provide a safety margin before deletion. -
Test CQ queries manually first — Before enabling automatic execution, run your continuous query SQL manually to verify correct results.
-
Combine with tiered storage — Use tiered storage to move data to cold storage before retention deletes it, keeping long-term archives at low cost.
-
Monitor CQ execution — Check Arc logs for CQ execution results and errors. Failed CQ executions are logged at WARN level.
Next Steps
- Continuous Queries — Create and manage continuous queries (OSS docs)
- Retention Policies — Create and manage retention policies (OSS docs)
- Tiered Storage — Combine scheduling with tiered storage for optimal cost management