architectureclickhouseperformance

How We Handle 1 Million Logs Per Second with ClickHouse

A deep dive into Purl's architecture: why we chose ClickHouse, our schema design, and the optimizations that make sub-second queries possible.

Otabek IsmoilovFebruary 5, 20267 min read

Why ClickHouse for Logs?

When we built Purl, we evaluated every major database for log storage: Elasticsearch, PostgreSQL, TimescaleDB, and ClickHouse. ClickHouse won decisively for three reasons:

  1. 1Columnar storage: Log queries typically touch 2-3 columns (timestamp, level, message). ClickHouse only reads the columns you query, skipping the rest.
  2. 2Extreme compression: ClickHouse achieves 10-20x compression on log data using LZ4 and ZSTD codecs. 1TB of raw logs becomes 50-100GB on disk.
  3. 3Vectorized query execution: Queries process data in batches of 8192 rows using SIMD instructions, achieving throughput that's orders of magnitude faster than row-based databases.

Schema Design

Our log table uses the MergeTree engine family — specifically ReplacingMergeTree for deduplication:

sql CREATE TABLE logs ( timestamp DateTime64(3), level LowCardinality(String), message String, source LowCardinality(String), host LowCardinality(String), metadata Map(String, String), ingested_at DateTime DEFAULT now() ) ENGINE = MergeTree() PARTITION BY toYYYYMMDD(timestamp) ORDER BY (source, level, timestamp) TTL timestamp + INTERVAL 90 DAY SETTINGS index_granularity = 8192 ```
CREATE TABLE logs (
    timestamp DateTime64(3),
    level LowCardinality(String),
    message String,
    source LowCardinality(String),
    host LowCardinality(String),
    metadata Map(String, String),
    ingested_at DateTime DEFAULT now()
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (source, level, timestamp)
TTL timestamp + INTERVAL 90 DAY
SETTINGS index_granularity = 8192

Key design decisions:

  • LowCardinality for level, source, host — these have few unique values, so dictionary encoding saves 90%+ space
  • Partitioning by day — enables instant partition drops for retention cleanup instead of expensive DELETE queries
  • ORDER BY (source, level, timestamp) — optimizes the most common query patterns
  • TTL — automatic data expiration, no cron jobs needed

Query Optimization

Materialized Views for Patterns

We use materialized views to pre-aggregate log patterns in real-time:

sql CREATE MATERIALIZED VIEW log_patterns_mv ENGINE = SummingMergeTree() ORDER BY (source, pattern_hash, hour) AS SELECT source, cityHash64(replaceRegexpAll(message, '[0-9]+', 'N')) AS pattern_hash, toStartOfHour(timestamp) AS hour, count() AS count, any(message) AS sample FROM logs GROUP BY source, pattern_hash, hour ```
CREATE MATERIALIZED VIEW log_patterns_mv
ENGINE = SummingMergeTree()
ORDER BY (source, pattern_hash, hour)
AS SELECT
    source,
    cityHash64(replaceRegexpAll(message, '[0-9]+', 'N')) AS pattern_hash,
    toStartOfHour(timestamp) AS hour,
    count() AS count,
    any(message) AS sample
FROM logs
GROUP BY source, pattern_hash, hour

This gives us instant pattern detection — the "Patterns" tab in Purl's dashboard queries this materialized view, not the raw log table.

Projection for Time-Series Queries

For the dashboard's time-series charts, we use projections:

sql ALTER TABLE logs ADD PROJECTION logs_by_time ( SELECT toStartOfMinute(timestamp) AS minute, level, count() AS count GROUP BY minute, level ) ```
ALTER TABLE logs ADD PROJECTION logs_by_time (
    SELECT
        toStartOfMinute(timestamp) AS minute,
        level,
        count() AS count
    GROUP BY minute, level
)

This allows the dashboard to render log volume charts without scanning the full table.

Benchmarks

On a single node (4 vCPU, 8GB RAM, NVMe SSD):

  • Ingest: 150,000 logs/second sustained
  • Query (last 1 hour): 23ms average
  • Query (last 24 hours, 50M rows): 180ms average
  • Full-text search: 450ms for regex across 100M logs
  • Storage: 1 billion logs in ~45GB (compressed)

For comparison, Elasticsearch on the same hardware handles ~20,000 logs/second with 3-5 second query times at the same scale.

Operational Simplicity

The biggest win isn't performance — it's operations. ClickHouse requires almost zero maintenance:

  • No shard management
  • No rebalancing
  • No JVM tuning
  • Automatic TTL-based cleanup
  • Single binary deployment

This is why Purl can offer production-grade log management starting at just docker compose up.