Product Analytics

Build a product analytics backend with Arc Cloud -- columnar storage, SQL queries, and sub-second aggregations over millions of events.

Why Arc Cloud for Analytics

Traditional product analytics platforms lock you into proprietary query languages and rigid schemas. Arc Cloud gives you:

Columnar Storage: Parquet-backed storage compresses event data efficiently and accelerates analytical queries
Full SQL: Use DuckDB SQL for funnels, cohorts, retention, and ad-hoc analysis -- no proprietary query language
High Ingestion Throughput: Ingest tens of thousands to millions of events per second depending on your tier
Schema Auto-Detection: Send events and Arc Cloud creates tables and columns automatically
Cost-Effective Retention: Keep raw events for weeks and aggregated rollups indefinitely

Schema Design

Design your events around a measurement with tags and fields. Arc Cloud auto-creates columns on first insert, so you can start simple and add fields over time.

Recommended Event Schema

Column	Type	Description
`time`	timestamp (ns)	When the event occurred
`source`	tag	Event origin (e.g., `web`, `mobile`, `api`)
`event_name`	field (string)	Event type (e.g., `page_view`, `signup`, `purchase`)
`user_id`	field (string)	Unique user identifier
`session_id`	field (string)	Session identifier for grouping
`page`	field (string)	Page URL or screen name
`referrer`	field (string)	Traffic source
`properties`	field (string)	JSON-encoded key-value pairs for event-specific data

Example in Line Protocol

events,source=web event_name="purchase",user_id="u_8f3k2",session_id="sess_a1b2c3",page="/checkout",referrer="google",properties="{\"product_id\":\"prod_99\",\"amount\":49.99,\"currency\":\"USD\"}" 1711200721000000000

Ingesting Events

Send events to your Arc Cloud instance using line protocol or MessagePack. Batch multiple events per request for optimal throughput.

Single Event

curl -X POST "https://<instance-id>.arc.<region>.basekick.net/api/v1/write/line-protocol?db=analytics" \
  -H "Authorization: Bearer <your-token>" \
  -H "Content-Type: text/plain" \
  -d 'events,source=web event_name="page_view",user_id="u_8f3k2",session_id="sess_a1b2c3",page="/pricing",referrer="google" 1711200721000000000'

Batch Ingestion

For production use, batch multiple events per request (one line per event):

curl -X POST "https://<instance-id>.arc.<region>.basekick.net/api/v1/write/line-protocol?db=analytics" \
  -H "Authorization: Bearer <your-token>" \
  -H "Content-Type: text/plain" \
  -d 'events,source=web event_name="page_view",user_id="u_8f3k2",page="/pricing" 1711200721000000000
events,source=web event_name="signup",user_id="u_8f3k2",page="/signup" 1711200725000000000
events,source=mobile event_name="page_view",user_id="u_9x7m1",page="/home" 1711200792000000000'

JavaScript Example

const ARC_URL = "https://<instance-id>.arc.<region>.basekick.net";
const ARC_TOKEN = "<your-token>";

// Buffer events and flush in batches
const eventBuffer = [];

function trackEvent(eventName, userId, properties = {}) {
  const timestampNs = Date.now() * 1_000_000; // ms to ns
  const propsJson = JSON.stringify(properties).replace(/"/g, '\\"');
  eventBuffer.push(
    `events,source=web event_name="${eventName}",user_id="${userId}",session_id="${getSessionId()}",page="${window.location.pathname}",referrer="${document.referrer}",properties="${propsJson}" ${timestampNs}`
  );

  if (eventBuffer.length >= 100) {
    flushEvents();
  }
}

async function flushEvents() {
  if (eventBuffer.length === 0) return;

  const batch = eventBuffer.splice(0, eventBuffer.length);
  await fetch(`${ARC_URL}/api/v1/write/line-protocol?db=analytics`, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${ARC_TOKEN}`,
      "Content-Type": "text/plain",
    },
    body: batch.join("\n"),
  });
}

Python SDK Example

from arc_tsdb_client import ArcClient

client = ArcClient(
    url="https://<instance-id>.arc.<region>.basekick.net",
    token="<your-token>",
    database="analytics",
)

with client.buffered_writer(batch_size=5000, flush_interval=5.0) as writer:
    writer.write(
        measurement="events",
        tags={"source": "web"},
        fields={
            "event_name": "page_view",
            "user_id": "u_8f3k2",
            "session_id": "sess_a1b2c3",
            "page": "/pricing",
            "referrer": "google",
        },
    )

Querying Events

Query your data using SQL via the /api/v1/query endpoint:

curl -X POST "https://<instance-id>.arc.<region>.basekick.net/api/v1/query" \
  -H "Authorization: Bearer <your-token>" \
  -H "Content-Type: application/json" \
  -d '{"sql": "SELECT * FROM analytics.events ORDER BY time DESC LIMIT 10", "format": "json"}'

For large result sets, use the Arrow endpoint for efficient binary transfer:

curl -X POST "https://<instance-id>.arc.<region>.basekick.net/api/v1/query/arrow" \
  -H "Authorization: Bearer <your-token>" \
  -H "Content-Type: application/json" \
  -d '{"sql": "SELECT * FROM analytics.events ORDER BY time DESC LIMIT 100000"}' \
  -o results.arrow

Example Queries

Daily Active Users

SELECT
    date_trunc('day', epoch_us(time)) AS day,
    COUNT(DISTINCT user_id) AS daily_active_users
FROM analytics.events
WHERE time > NOW() - INTERVAL '30 days'
GROUP BY day
ORDER BY day;

Funnel Analysis

Track conversion through a sequence of steps:

WITH step1 AS (
    SELECT DISTINCT user_id
    FROM analytics.events
    WHERE event_name = 'page_view' AND page = '/pricing'
      AND time > NOW() - INTERVAL '7 days'
),
step2 AS (
    SELECT DISTINCT user_id
    FROM analytics.events
    WHERE event_name = 'signup'
      AND time > NOW() - INTERVAL '7 days'
      AND user_id IN (SELECT user_id FROM step1)
),
step3 AS (
    SELECT DISTINCT user_id
    FROM analytics.events
    WHERE event_name = 'purchase'
      AND time > NOW() - INTERVAL '7 days'
      AND user_id IN (SELECT user_id FROM step2)
)
SELECT
    'Viewed Pricing' AS step,
    (SELECT COUNT(*) FROM step1) AS users
UNION ALL
SELECT 'Signed Up', (SELECT COUNT(*) FROM step2)
UNION ALL
SELECT 'Purchased', (SELECT COUNT(*) FROM step3);

Retention Cohorts

Calculate week-over-week retention by signup cohort:

WITH cohorts AS (
    SELECT
        user_id,
        date_trunc('week', epoch_us(MIN(time))) AS cohort_week
    FROM analytics.events
    WHERE event_name = 'signup'
    GROUP BY user_id
),
activity AS (
    SELECT
        e.user_id,
        c.cohort_week,
        date_trunc('week', epoch_us(e.time)) AS activity_week
    FROM analytics.events e
    JOIN cohorts c ON e.user_id = c.user_id
)
SELECT
    cohort_week,
    COUNT(DISTINCT user_id) AS cohort_size,
    COUNT(DISTINCT CASE WHEN activity_week = cohort_week + INTERVAL '1 week' THEN user_id END) AS week_1,
    COUNT(DISTINCT CASE WHEN activity_week = cohort_week + INTERVAL '2 weeks' THEN user_id END) AS week_2,
    COUNT(DISTINCT CASE WHEN activity_week = cohort_week + INTERVAL '3 weeks' THEN user_id END) AS week_3,
    COUNT(DISTINCT CASE WHEN activity_week = cohort_week + INTERVAL '4 weeks' THEN user_id END) AS week_4
FROM activity
GROUP BY cohort_week
ORDER BY cohort_week;

Top Pages

SELECT
    page,
    COUNT(*) AS views,
    COUNT(DISTINCT user_id) AS unique_visitors,
    COUNT(DISTINCT session_id) AS sessions
FROM analytics.events
WHERE event_name = 'page_view'
  AND time > NOW() - INTERVAL '7 days'
GROUP BY page
ORDER BY views DESC
LIMIT 20;

Aggregation Rollups with Continuous Queries

Pre-aggregate metrics into rollup tables so dashboards query summarized data instead of scanning raw events.

Daily Rollup

Create a continuous query that aggregates events into daily summaries:

curl -X POST https://<instance-id>.arc.<region>.basekick.net/api/v1/continuous_queries \
  -H "Authorization: Bearer <your-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "events_daily_rollup",
    "database": "analytics",
    "source_measurement": "events",
    "destination_measurement": "events_daily",
    "query": "SELECT date_trunc('\''day'\'', epoch_us(time)) AS time, event_name, COUNT(*) AS event_count, COUNT(DISTINCT user_id) AS unique_users, COUNT(DISTINCT session_id) AS unique_sessions FROM analytics.events GROUP BY date_trunc('\''day'\'', epoch_us(time)), event_name",
    "interval": "1d",
    "is_active": true
  }'

Hourly Rollup

For more granular dashboards, create an hourly rollup:

curl -X POST https://<instance-id>.arc.<region>.basekick.net/api/v1/continuous_queries \
  -H "Authorization: Bearer <your-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "events_hourly_rollup",
    "database": "analytics",
    "source_measurement": "events",
    "destination_measurement": "events_hourly",
    "query": "SELECT date_trunc('\''hour'\'', epoch_us(time)) AS time, event_name, page, COUNT(*) AS event_count, COUNT(DISTINCT user_id) AS unique_users FROM analytics.events GROUP BY date_trunc('\''hour'\'', epoch_us(time)), event_name, page",
    "interval": "1h",
    "is_active": true
  }'

Querying Rollups

Once rollups are populated, query them for fast dashboard responses:

-- Daily trend from rollup (fast)
SELECT time, event_name, event_count, unique_users
FROM analytics.events_daily
WHERE time > NOW() - INTERVAL '30 days'
ORDER BY time;

-- Hourly breakdown from rollup (fast)
SELECT time, page, event_count
FROM analytics.events_hourly
WHERE event_name = 'page_view'
  AND time > NOW() - INTERVAL '24 hours'
ORDER BY time;

Retention Policies

Keep raw events for recent analysis and rollups for long-term trends.

Keep Raw Events for 30 Days

curl -X POST https://<instance-id>.arc.<region>.basekick.net/api/v1/retention \
  -H "Authorization: Bearer <your-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "raw_events_30d",
    "database": "analytics",
    "measurement": "events",
    "retention_days": 30,
    "buffer_days": 3,
    "is_active": true
  }'

Keep Rollups Indefinitely

Do not set a retention policy on events_daily or events_hourly -- they will be retained indefinitely. Rollup data is compact (typically 100-1000x smaller than raw events), so long-term storage costs are minimal.

Storage Strategy Summary

Data	Retention	Purpose
`events` (raw)	30 days	Detailed drill-down, debugging
`events_hourly`	No limit	Hourly dashboards, intra-day analysis
`events_daily`	No limit	Long-term trends, reporting

Next Steps

Data Ingestion Patterns -- All ingestion methods and optimization
SQL Querying Guide -- Full SQL reference for Arc
Continuous Queries -- Advanced rollup configuration
Retention Policies -- Manage data lifecycle

Why Arc Cloud for Analytics​

Schema Design​

Recommended Event Schema​

Example in Line Protocol​

Ingesting Events​

Single Event​

Batch Ingestion​

JavaScript Example​

Python SDK Example​

Querying Events​

Example Queries​

Daily Active Users​

Funnel Analysis​

Retention Cohorts​

Top Pages​

Aggregation Rollups with Continuous Queries​

Daily Rollup​

Hourly Rollup​

Querying Rollups​

Retention Policies​

Keep Raw Events for 30 Days​

Keep Rollups Indefinitely​

Storage Strategy Summary​

Next Steps​