Skip to main content

Data Model

Memtrace uses a hybrid storage model: time-series data in Arc, metadata in SQLite.

Arc Storage (Time-Series Data)

All memories are stored in a single events measurement with columns optimized for filtering and querying.

Schema

ColumnTypeDescription
timeTIMESTAMPAuto-set by Arc, nanosecond precision
org_idVARCHAROrganization/tenant ID
agent_idVARCHARAgent that created this memory
session_idVARCHARSession scope (empty if unscoped)
memory_typeVARCHARepisodic / session / decision / entity
event_typeVARCHARApp-defined (e.g., page_crawled, error)
contentVARCHARPrimary content text
metadata_jsonVARCHARJSON-encoded arbitrary key-value data
tags_csvVARCHARComma-separated tags
dedup_keyVARCHARDeduplication key (SHA256)
importanceDOUBLE0.0-1.0 score
parent_idVARCHARLink to parent memory (threading)

Example Row

time:          2026-02-13T15:30:00.123456789Z
org_id: org_abc123
agent_id: web_crawler
session_id: sess_xyz789
memory_type: episodic
event_type: page_crawled
content: Crawled https://example.com/products — found 12 items
metadata_json: {"url":"https://example.com/products","item_count":12}
tags_csv: crawling,products,success
dedup_key: sha256_hash_of_key_fields
importance: 0.7
parent_id: null

Arc Columnar Format

Arc stores data in Apache Parquet format — a columnar storage format optimized for analytical queries.

Benefits of Columnar Storage

Compression: Columnar data compresses 10x-100x better than row-based JSON. Repeated values in columns (like agent_id or memory_type) compress extremely well.

Query speed: Time-windowed queries only read relevant time partitions. Column pruning means queries only read the columns they need.

Example:

-- Only reads: time, agent_id, content, memory_type
SELECT time, content
FROM events
WHERE agent_id = 'my_agent'
AND time > now() - interval '2 hours'
AND memory_type = 'episodic'

Partitioning

Arc partitions data by time:

{database}/{measurement}/{year}/{month}/{day}/{hour}/{filename}.parquet

Example:

memtrace/events/2026/02/13/15/1708005000_1708008600.parquet

This means time-windowed queries (e.g., "last 2 hours") only scan relevant files, not the entire dataset.

SQLite Metadata Store

Local SQLite database stores metadata that's queried separately from time-series data.

Organizations Table

CREATE TABLE organizations (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
created_at INTEGER NOT NULL
);

Agents Table

CREATE TABLE agents (
id TEXT PRIMARY KEY,
org_id TEXT NOT NULL,
name TEXT NOT NULL,
config_json TEXT,
created_at INTEGER NOT NULL,
FOREIGN KEY (org_id) REFERENCES organizations(id)
);

Sessions Table

CREATE TABLE sessions (
id TEXT PRIMARY KEY,
org_id TEXT NOT NULL,
agent_id TEXT NOT NULL,
name TEXT,
metadata_json TEXT,
created_at INTEGER NOT NULL,
ended_at INTEGER,
FOREIGN KEY (org_id) REFERENCES organizations(id),
FOREIGN KEY (agent_id) REFERENCES agents(id)
);

API Keys Table

CREATE TABLE api_keys (
id TEXT PRIMARY KEY,
org_id TEXT NOT NULL,
key_hash TEXT NOT NULL, -- bcrypt hash
name TEXT,
created_at INTEGER NOT NULL,
last_used_at INTEGER,
FOREIGN KEY (org_id) REFERENCES organizations(id)
);

Security note: API keys are bcrypt-hashed. The plaintext key is shown only once at creation and never stored.

Deduplication Strategy

Memtrace prevents duplicate memories using a SHA256 dedup key.

Dedup Key Generation

key := SHA256(agent_id + event_type + content[:200])

The key is derived from:

  • agent_id — Which agent created it
  • event_type — What type of event
  • content[:200] — First 200 chars of content

Dedup Window

Before writing a memory, Memtrace checks Arc for an existing memory with the same dedup_key within a configurable time window (default: 24h).

Query:

SELECT COUNT(*) FROM events
WHERE dedup_key = 'sha256_hash'
AND time > now() - interval '24 hours'

If a match is found, the write is skipped.

Configuration

[deduplication]
enabled = true
window = "24h"

Use cases:

  • Prevent agents from logging "API call failed" 100 times in a loop
  • Avoid duplicate "user clicked button" events from double-clicks
  • Skip redundant "crawled page X" if already crawled recently

Shared Memory Model

Memtrace supports multi-agent shared memory through scoping.

Organization Scope

All agents in an organization can query each other's memories by filtering on org_id.

Query:

SELECT * FROM events
WHERE org_id = 'org_abc123'
AND time > now() - interval '4 hours'

Session Scope

Multiple agents can write to the same session, sharing a bounded context.

Query:

SELECT * FROM events
WHERE session_id = 'sess_xyz789'
AND time > now() - interval '2 hours'

Tag-Based Filtering

Agents can query memories by tags, regardless of which agent created them.

Query:

SELECT * FROM events
WHERE org_id = 'org_abc123'
AND tags_csv LIKE '%customer_123%'
AND time > now() - interval '7 days'

Write Batching

Memtrace buffers writes in-memory and flushes to Arc in batches.

Batch Configuration

[write_buffer]
max_size = 1000 # Flush after N memories
max_interval = "5s" # Flush after N seconds

Batch Format

Writes are sent to Arc in columnar msgpack format — the same format Arc uses internally. This avoids conversion overhead.

Example batch:

{
"database": "memtrace",
"measurement": "events",
"columns": ["time", "agent_id", "content", "memory_type", ...],
"rows": [
[1708005000, "agent_1", "Memory 1", "episodic", ...],
[1708005001, "agent_1", "Memory 2", "episodic", ...],
[1708005002, "agent_2", "Memory 3", "decision", ...]
]
}

Benefits

  • High throughput: 10,000+ writes/sec without overwhelming Arc
  • Reduced network overhead: One HTTP request for 1000 memories instead of 1000 requests
  • Arc optimization: Columnar batches compress better and write faster

Query Patterns

Time-Windowed Recall

Most common query: "What happened in the last N hours?"

SELECT time, content, memory_type, event_type
FROM events
WHERE agent_id = 'my_agent'
AND time > now() - interval '2 hours'
ORDER BY time DESC

Session Context

Get all memories for a session, grouped by type.

SELECT memory_type, time, content, event_type
FROM events
WHERE session_id = 'sess_xyz789'
AND time > now() - interval '4 hours'
ORDER BY memory_type, time DESC

Find memories by tag across agents.

SELECT time, agent_id, content
FROM events
WHERE org_id = 'org_abc123'
AND tags_csv LIKE '%customer_123%'
AND time > now() - interval '7 days'
ORDER BY time DESC

Importance Filtering

Get only high-importance memories.

SELECT time, content, importance
FROM events
WHERE agent_id = 'my_agent'
AND importance >= 0.8
AND time > now() - interval '24 hours'
ORDER BY importance DESC, time DESC

Next Steps

  • Explore the API Reference to see how these queries map to REST endpoints
  • Read about Memory Types to understand the semantic model
  • Check out Use Cases for real-world examples