ClickHouse Analytics
ClickHouse schema, ETL pipeline, and query patterns for NexusCommerce analytics.
Overview
NexusCommerce uses ClickHouse as a dedicated analytics store alongside the primary Supabase (PostgreSQL) database. ClickHouse's columnar storage and vectorized query execution deliver sub-second analytics queries on datasets that would take seconds or minutes in PostgreSQL.
The schema is in services/clickhouse/schema.sql. The ETL pipeline runs continuously in the NestJS API, syncing new records from Supabase to ClickHouse.
Key Concepts
Columnar Storage — ClickHouse stores data column by column rather than row by row. Analytics queries typically access a few columns across many rows, which is extremely efficient in columnar layout.
MergeTree Engine — The primary ClickHouse table engine. ReplacingMergeTree is used for tables that receive updates (e.g., order status changes). AggregatingMergeTree is used for pre-computed aggregations.
Materialized View — A pre-computed query result that ClickHouse updates incrementally as new data arrives. NexusCommerce's dashboard charts load from materialized views, not from raw tables.
ETL — Extract, Transform, Load. The NestJS API's ETL service reads new/updated records from Supabase change events (via Supabase Realtime or polling) and inserts them into ClickHouse.
Getting Started
Initialize ClickHouse schema:
docker compose run --rm clickhouse-initThis executes services/clickhouse/schema.sql against the ClickHouse instance.
Verify the schema:
clickhouse-client --query "SHOW TABLES FROM nexuscommerce"Features
Core Tables
orders
CREATE TABLE nexuscommerce.orders (
tenant_id UUID,
id UUID,
external_order_id String,
marketplace LowCardinality(String),
status LowCardinality(String),
fulfillment_method LowCardinality(String),
total_amount Decimal(10, 2),
currency LowCardinality(String),
created_at DateTime,
updated_at DateTime,
sign Int8
) ENGINE = CollapsingMergeTree(sign)
PARTITION BY toYYYYMM(created_at)
ORDER BY (tenant_id, created_at, id);The sign column supports updates: insert a row with sign = -1 (cancel) and sign = 1 (upsert) to handle status changes without deleting old data.
pricing_events
CREATE TABLE nexuscommerce.pricing_events (
tenant_id UUID,
sku String,
marketplace LowCardinality(String),
old_price Decimal(10, 2),
new_price Decimal(10, 2),
trigger LowCardinality(String), -- 'rule', 'ai_recommendation', 'manual'
rule_id Nullable(String),
created_at DateTime
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (tenant_id, sku, created_at);inventory_events
CREATE TABLE nexuscommerce.inventory_events (
tenant_id UUID,
sku String,
warehouse_id String,
before_quantity Int32,
after_quantity Int32,
reason LowCardinality(String),
created_at DateTime
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (tenant_id, sku, created_at);Materialized Views
daily_revenue
Pre-computes daily revenue per tenant per marketplace:
CREATE MATERIALIZED VIEW nexuscommerce.daily_revenue
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(day)
ORDER BY (tenant_id, marketplace, day)
AS SELECT
tenant_id,
marketplace,
toStartOfDay(created_at) AS day,
sumState(total_amount) AS revenue,
countState() AS order_count
FROM nexuscommerce.orders
WHERE sign = 1
GROUP BY tenant_id, marketplace, day;Query daily revenue:
SELECT
day,
marketplace,
sumMerge(revenue) AS revenue,
countMerge(order_count) AS orders
FROM nexuscommerce.daily_revenue
WHERE tenant_id = :tenantId
AND day >= :startDate
GROUP BY day, marketplace
ORDER BY day;ETL Pipeline
The NestJS ETL service (apps/api/src/analytics/etl.service.ts) runs on a 5-minute cron:
- Query Supabase for records with
updated_at > last_etl_run - Batch records (1,000 per batch)
- Insert into ClickHouse via HTTP interface
- Update
last_etl_runcheckpoint in Redis
For orders (which can change status), the ETL inserts two rows per update: a -1 sign row (cancelling the old state) and a +1 sign row (inserting the new state). ClickHouse's CollapsingMergeTree engine merges these during background compaction.
Example Analytics Queries
Revenue by marketplace, last 30 days:
SELECT
marketplace,
sumMerge(revenue) AS total_revenue,
countMerge(order_count) AS total_orders
FROM nexuscommerce.daily_revenue
WHERE tenant_id = :tenantId
AND day >= now() - INTERVAL 30 DAY
GROUP BY marketplace
ORDER BY total_revenue DESC;Top 10 SKUs by revenue, last 7 days:
SELECT
JSONExtractString(line_items, '$[*].sku') AS sku,
sum(total_amount) AS revenue
FROM nexuscommerce.orders
WHERE tenant_id = :tenantId
AND created_at >= now() - INTERVAL 7 DAY
AND sign = 1
GROUP BY sku
ORDER BY revenue DESC
LIMIT 10;Return rate by marketplace:
SELECT
o.marketplace,
countIf(r.id IS NOT NULL) / count() AS return_rate
FROM nexuscommerce.orders o
LEFT JOIN nexuscommerce.returns r ON r.order_id = o.id
WHERE o.tenant_id = :tenantId
AND o.created_at >= now() - INTERVAL 30 DAY
AND o.sign = 1
GROUP BY o.marketplace;Configuration
| Variable | Description | Default |
|---|---|---|
CLICKHOUSE_HOST | ClickHouse server host | localhost |
CLICKHOUSE_PORT | HTTP interface port | 8123 |
CLICKHOUSE_DATABASE | Database name | nexuscommerce |
CLICKHOUSE_USER | Username | default |
CLICKHOUSE_PASSWORD | Password | Required in production |
ETL_INTERVAL_MINUTES | How often the ETL pipeline runs | 5 |
ETL_BATCH_SIZE | Records per ClickHouse insert batch | 1000 |
CLICKHOUSE_RETENTION_MONTHS | Data retention policy | 24 |