tutorialClickHousewarehouse

How to Use ClickHouse for Warehouse Automation Analytics: Schemas, Aggregations, and Retention

UUnknown

2026-02-20

10 min read

Practical ClickHouse patterns for warehouse engineers: schema, aggregations, retention, and JS integration to ship fast KPIs.

Stop reinventing warehouse KPIs — get sub-second analytics that scale

Problem: your warehouse automation stack emits a sea of events (scanners, conveyors, PLCs, WMS, AMRs) but dashboards are slow, queries time out, and retention policies are a mess. You need reliable, low-latency KPIs — picks/hour, dock-to-stock, order cycle time — powered by a backend that scales with throughput and keeps costs under control.

What you'll get in this guide (TL;DR)

Concrete schema patterns for ClickHouse optimized for warehouse telemetry
Aggregation strategies (materialized views, AggregatingMergeTree, projections) to serve fast KPIs
Retention and TTL recipes to balance query performance and storage cost
JS query strategies with runnable examples (Node/@clickhouse/client, fetch), plus React/Vue/vanilla/Web Component integration snippets
Operational tips for 2026 trends: streaming ingestion, hybrid OLAP/real‑time workloads, and multi‑tenant clusters

Why ClickHouse for warehouse automation analytics in 2026

ClickHouse is now a mainstream OLAP engine used for high-throughput telemetry and metrics. In 2026 the platform and ecosystem matured — improved cluster management, client libraries, and funding — making it a go-to choice for time-series and event analytics at scale. Bloomberg and other outlets reported major funding rounds in late 2025, signaling accelerated innovation and enterprise readiness. At the same time, warehouse automation trends show a shift toward integrated, data-driven control planes where analytics must be near real-time to affect dispatch and labor decisions.

"As warehouse automation evolves beyond standalone systems, integrated analytics drives measurable productivity gains." — Designing Tomorrow's Warehouse, 2026 playbook webinar

Core principles (apply these first)

Narrow event table: store raw events in a compact, append-only table optimized for inserts and range scans by time.
Pre-aggregate for KPIs: keep per-minute or per-hour aggregates for heavy dashboards — materialized views or AggregatingMergeTree make queries predictable.
Low-cardinality dims: use LowCardinality for dimensions like sku, zone to improve GROUP BY performance.
Partition by coarse time buckets: to keep partition counts manageable (toYYYYMM or toYYYYMMDD depending on cardinality).
TTL and lifecycle: age raw events aggressively, keep aggregates longer.

Schema design patterns for warehouse KPIs

1) Event table: the narrow, highly compressible source of truth

Design the raw event table to be append-optimized using MergeTree. Use sensible column types, LowCardinality for repeating strings, and an ORDER BY that matches your most common filters (warehouse_id + event_date + sku_id).

CREATE TABLE warehouse_events
(
  timestamp     DateTime64(3),
  warehouse_id  UInt32,
  zone_id       LowCardinality(String),
  sku           LowCardinality(String),
  event_type    Enum8('pick'=1, 'putaway'=2, 'move'=3, 'scan'=4),
  worker_id     UInt32,
  qty           Int32,
  meta          String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (warehouse_id, toDate(timestamp), sku)
TTL timestamp + INTERVAL 30 DAY
SETTINGS index_granularity = 8192;

Notes: ORDER BY includes toDate(timestamp) to speed daily range queries while keeping sorting stable. TTL at 30 days for raw events is a starting point — adjust to your compliance and analytics needs.

2) Aggregates store: materialized hourly/minute rollups

To serve KPIs at dashboard speed, maintain a pre-aggregated table. Use AggregatingMergeTree or materialized views that write into a summarized table. AggregatingMergeTree stores aggregate function states and is efficient for incremental merges and rollups.

CREATE TABLE kpi_minute_agg
(
  minute        DateTime64(3),
  warehouse_id  UInt32,
  zone_id       LowCardinality(String),
  sku           LowCardinality(String),
  picks_state   AggregateFunction(sumState, Int64),
  qty_state     AggregateFunction(sumState, Int64)
)
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(minute)
ORDER BY (warehouse_id, minute, sku);

CREATE MATERIALIZED VIEW mv_minute_agg
TO kpi_minute_agg
AS
SELECT
  toStartOfMinute(timestamp) AS minute,
  warehouse_id,
  zone_id,
  sku,
  sumState(if(event_type = 'pick', 1, 0)) AS picks_state,
  sumState(qty) AS qty_state
FROM warehouse_events
GROUP BY minute, warehouse_id, zone_id, sku;

Query aggregated results with aggregateMerge and aggregateFunctionFinal (or use SELECT ... FINAL patterns).

3) Dimension tables and dictionaries

Keep small dimension tables (e.g., sku master, zone metadata) as MergeTree or external dictionaries for faster joins. ClickHouse dictionaries excel at serving high-cardinality lookups from memory-backed sources or Redis.

CREATE TABLE sku_master
(
  sku LowCardinality(String) PRIMARY KEY,
  description String,
  category LowCardinality(String)
) ENGINE = TinyLog;

4) Partitioning & sorting rules

Partition by month to avoid millions of small partitions.
ORDER BY should place most-filtered columns first — warehouse_id then time then sku.
Consider a sampling key (SAMPLE BY) for ad-hoc approximate analytics on very large raw tables.

Aggregation strategies explained

Materialized views vs AggregatingMergeTree vs Projections

Materialized views are simple to set up and write transformed rows into target tables synchronously when inserts happen.
AggregatingMergeTree stores partial aggregate states; use it when you want merge-time aggregation and smaller storage for intermediate states.
Projections (ClickHouse 22+) are on-disk precomputed representations of queries inside the same table and can be faster for some patterns because they are managed by the table engine.

Examples of KPI SQL

Picks per hour by zone (from minute aggregates):

SELECT
  toHour(minute) AS hour,
  zone_id,
  sumMerge(picks_state) AS picks
FROM kpi_minute_agg
WHERE warehouse_id = 42
  AND minute >= now() - INTERVAL 24 HOUR
GROUP BY hour, zone_id
ORDER BY hour, zone_id;

Dock-to-stock average (raw events stream):

-- assuming event_type contains 'dock_arrival' and 'stocked' with same order_id
WITH
  minIf(timestamp, event_type = 'dock_arrival') AS dock_ts,
  minIf(timestamp, event_type = 'stocked') AS stocked_ts
SELECT
  avg(dateDiff('second', dock_ts, stocked_ts)) AS avg_seconds
FROM (
  SELECT order_id,
         anyIf(timestamp, event_type='dock_arrival') AS dock_ts,
         anyIf(timestamp, event_type='stocked') AS stocked_ts
  FROM warehouse_events
  WHERE event_type IN ('dock_arrival','stocked')
  GROUP BY order_id
) WHERE dock_ts IS NOT NULL AND stocked_ts IS NOT NULL;

Retention and storage lifecycle (practice)

Retention should be two-tier: short retention for raw events (30–90 days) and longer retention for rolled up aggregates (12–36 months). Use TTL for row-level deletes and partition drops for large-scale purges.

Inline TTL (row-level)

ALTER TABLE warehouse_events
MODIFY TTL timestamp + INTERVAL 30 DAY;

Move old data to colder volumes

CREATE VOLUME cold_volume ENGINE = local;
ALTER TABLE warehouse_events
MODIFY TTL
  timestamp + INTERVAL 30 DAY TO VOLUME 'cold_volume',
  timestamp + INTERVAL 120 DAY TO DISK 'slow_disk';

Move recent data to fast NVMe, older partitions to cheaper HDD or object storage via ClickHouse storage policies (2026 features improved object-store integration).

Archival pattern: drop partitions after compact export

Export month to S3 in Parquet format
Drop the partition to reclaim space

JS query strategies for production (Node + browser patterns)

In production, avoid exposing ClickHouse directly to the browser. Use a thin API service (Node/Go) that queries ClickHouse and enforces auth, rate limits and predictable timeouts.

Server: Node.js example with @clickhouse/client

// npm i @clickhouse/client express
const { ClickHouseClient } = require('@clickhouse/client');
const express = require('express');
const app = express();

const client = new ClickHouseClient({
  host: 'https://clickhouse.example.internal',
  username: 'svc_analytics',
  password: process.env.CH_PASS
});

app.get('/api/kpi/picks-per-hour', async (req, res) => {
  const { warehouseId } = req.query;
  const sql = `
    SELECT toHour(minute) AS hour, zone_id, sumMerge(picks_state) AS picks
    FROM kpi_minute_agg
    WHERE warehouse_id = {wid:UInt32} AND minute >= now() - INTERVAL 24 HOUR
    GROUP BY hour, zone_id
    ORDER BY hour, zone_id
  `;

  try {
    const stream = await client.query({ query: sql, params: { wid: Number(warehouseId) } });
    const result = [];
    for await (const row of stream.jsonObjects()) {
      result.push(row);
    }
    res.json(result);
  } catch (err) {
    console.error(err);
    res.status(500).json({ error: 'query_failed' });
  }
});

app.listen(3000);

Browser: React example (calls your API)

import React, { useEffect, useState } from 'react';

export default function PicksPerHour({ warehouseId }) {
  const [data, setData] = useState([]);
  useEffect(() => {
    let mounted = true;
    fetch(`/api/kpi/picks-per-hour?warehouseId=${warehouseId}`)
      .then(r => r.json())
      .then(d => mounted && setData(d))
      .catch(e => console.error(e));
    return () => { mounted = false; };
  }, [warehouseId]);

  return (
    
      Picks / Hour (24h)
      
        {data.map(r => (
          {r.hour} — {r.zone_id}: {r.picks}
        ))}
      
    
  );
}

async function fetchKPIs(warehouseId) {
  const res = await fetch(`/api/kpi/picks-per-hour?warehouseId=${warehouseId}`);
  return await res.json();
}

Vue composition API snippet

import { ref, onMounted } from 'vue'
export function usePicks(warehouseId) {
  const data = ref([])
  onMounted(async () => {
    const res = await fetch(`/api/kpi/picks-per-hour?warehouseId=${warehouseId}`)
    data.value = await res.json()
  })
  return { data }
}

Lightweight Web Component

class PicksWidget extends HTMLElement {
  connectedCallback() {
    this.warehouseId = this.getAttribute('warehouse-id')
    this.render()
    this.load()
  }
  async load() {
    const res = await fetch(`/api/kpi/picks-per-hour?warehouseId=${this.warehouseId}`)
    const data = await res.json()
    this.innerHTML = '' + JSON.stringify(data, null, 2) + ''
  }
}
customElements.define('picks-widget', PicksWidget)

Practical KPI queries and patterns

Here are KPI queries you’ll use daily — implement them as pre-aggregations where needed.

Throughput (picks/hour)

SELECT
  toStartOfHour(timestamp) AS hour,
  zone_id,
  countIf(event_type='pick') AS picks
FROM warehouse_events
WHERE timestamp >= now() - INTERVAL 24 HOUR
GROUP BY hour, zone_id
ORDER BY hour, zone_id;

Order cycle time percentiles

SELECT
  percentileExact(0.50)(dateDiff('second', dock_ts, shipped_ts)) AS p50,
  percentileExact(0.95)(dateDiff('second', dock_ts, shipped_ts)) AS p95
FROM (
  SELECT order_id,
         minIf(timestamp, event_type='dock_arrival') AS dock_ts,
         minIf(timestamp, event_type='shipped') AS shipped_ts
  FROM warehouse_events
  GROUP BY order_id
) WHERE dock_ts IS NOT NULL AND shipped_ts IS NOT NULL;

Operational tips and 2026 best practices

Use Kafka → ClickHouse Kafka engine for low-latency ingestion and backpressure handling. In 2026 Kafka+ClickHouse remains a dominant pattern for telemetry.
Deploy replicas and use distributed tables for multi-rack resilience. Keep query timeouts and max_memory_usage configured per-service.
Monitor background merges and pauses — retention TTLs rely on merges to free space.
Use low-cardinality types and dictionaries to reduce memory footprint for dimensions.
Secure access: place ClickHouse in VPC, use mTLS or HTTPS, and never expose direct DB endpoints to browsers.

Short case study: Forklift sensor stream → dashboard (step-by-step)

Ingest sensor events to Kafka; topic: forklift-events
ClickHouse Kafka engine reads stream into raw table warehouse_events
Materialized view writes per-minute aggregates to kpi_minute_agg
API service (Node/@clickhouse/client) exposes KPI endpoints with parameterized queries
React dashboard consumes endpoints and updates charts; anomalies trigger alerts via webhook

Result: sub-second dashboards for current shift, and cost-controlled storage (raw events TTL 30 days, aggregated KPIs 3 years).

Checklist: Deploying ClickHouse for warehouse analytics

Design narrow append-optimized raw table
Define pre-aggregates for heavy dashboards
Use LowCardinality for dimensions
Partition by month, order by (warehouse, date, sku)
Apply TTL and move old data to cold volumes
Protect ClickHouse behind an API layer
Use Kafka ingestion for resiliency and replay
Benchmark queries and set query timeouts

Advanced strategies & future-looking notes (2026+)

Expect to combine ClickHouse with real-time feature stores and LLM-driven automation for predictive dispatching. With the ClickHouse ecosystem recently expanding, you'll see better connectors to object stores, improved security defaults, and first-class observability integrations. Architect for hybrid latency: pre-aggregates for dashboards and streaming analytics for control-plane decisions.

Actionable takeaways

Start with a narrow events table and an hourly/minute aggregate; measure query latency and adjust.
Retain raw events just long enough to debug production incidents; keep aggregates longer for reporting.
Implement a thin Node.js API with @clickhouse/client to centralize access control and caching.
Leverage TLS, VPCs, and role-based accounts — never call ClickHouse directly from the browser.

Next steps & call to action

If you manage a warehouse automation stack, start by modelling one KPI end-to-end this week: ingest events, create materialized minute aggregates, and ship a dashboard widget using the React snippet above. For a reproducible starter, clone our sample repo (ClickHouse DDL, Node API, React widget) and adapt partitions/TTL to your compliance requirements.

Want the repo and a 30-minute walkthrough? Contact our team or download the starter kit at javascripts.shop/clickhouse-warehouse-starter (includes DDL, sample data generator, and CI-friendly tests).

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.