Why ClickHouse fails as a general-purpose logging solution

Let's talk about ClickHouse. It's fast, it's efficient, it's open source. But it’s designed to be a general purpose analytical database, which makes it challenging, in terms of time and resources, to use for logging at scale.

As someone who's spent over a decade working on logging systems and watched countless organizations struggle with the same fundamental challenges, I've seen this pattern play out too many times: A team frustrated with expensive, legacy log management solutions turns to ClickHouse, excited by early performance wins on well-structured data. They rebuild some of their critical logging infrastructure around it, roll it out, and everything seems great...until it doesn't.

‍

Why teams try ClickHouse for logs

The appeal is undeniable. When you first point ClickHouse at a clean, structured dataset, the performance is genuinely impressive:

Speed: ClickHouse is built for analytical queries and can scan billions of rows in seconds on modest hardware.
Cost: As an open-source solution, the licensing costs are zero – in contrast to the egregious pricing from traditional logging vendors.
Familiarity: If your team already works with SQL, the learning curve is minimal compared to proprietary query languages.
Flexibility: You're in complete control of your data, schema, and retention policies.

For teams already operating in SQL-heavy environments or using ClickHouse for product analytics, the extension to logging feels natural. After all, logs are just another dataset to query, right?

Early benchmarks usually reinforce this perception. When you're testing on controlled, clean data with predictable patterns, ClickHouse shines. You might start with a specific subset of logs – perhaps API access logs or well-structured application events – and see impressive query speeds compared to bloated commercial solutions.

The problem isn't that ClickHouse doesn't work for logs. The problem is that it doesn't work for all logs, all the time, at scale, without significant engineering investment. And that's where organizations get trapped.

‍

The architectural mismatch

Logs in the wild are messy, inconsistent, and unpredictable – the direct opposite of what ClickHouse was designed to handle.

The columnar conundrum

ClickHouse's columnar storage engine is optimized for queries that scan large portions of specific columns. This works brilliantly for analytics workloads where you frequently compute aggregations across well-defined dimensions. But logs? They're fundamentally different:

Unpredictable schemas: Log formats change. New fields appear and disappear without warning. Applications emit different information depending on context.
Nested structures: Modern logs, especially from cloud environments, can contain deeply nested JSON with inconsistent structures.
High cardinality: Some log fields have enormous numbers of unique values (think session IDs, request IDs, etc.), which can present challenges for ClickHouse.
Mixed query patterns: Log analysis involves a mix of aggregation, filtering, full-text search, and pattern matching – not just the analytical queries ClickHouse excels at.

When you force-fit logs into ClickHouse, you're essentially asking a database optimized for structured data analytics to handle semi-structured or unstructured data. It's like racing a Formula 1 car on an off-road trail – impressive engineering, wrong application.

Schema rigidity vs. Log reality

In the real world, logs evolve frequently:

Developers add new fields to debug specific issues
Third-party systems change their log formats without notice
Cloud providers modify their event structures
Microservices architectures introduce more complexity and the likelihood of different conventions around logging

With ClickHouse, schema changes require careful planning to avoid additional overhead.

A purpose-built logging system accommodates this reality. At Bronto, we've designed a schema-less architecture that adapts to your logs, not the other way around. We don't force you to define schemas upfront or perform complex transformations before ingestion.

Indexing limitations

ClickHouse's primary indexing mechanism—the sparse primary index—works by creating index marks every N rows (typically thousands). This approach excels for batch analytical queries but creates significant compromises for log search:

Limited selectivity: The sparse indexing approach works best when you're scanning large portions of the dataset, not when you're looking for specific events.
Suboptimal for high-cardinality fields: Log data often contains high-cardinality fields like user IDs, session IDs, or trace IDs. ClickHouse's indexing struggles with these.
Text search limitations: Full-text search in ClickHouse is functional but not optimized for the complex pattern matching often needed in log analysis.

Real-world scaling shatters the illusion

The true challenges emerge as your ClickHouse logging solution scales. What worked smoothly in development with gigabytes becomes increasingly problematic at terabyte or petabyte scale:

Query performance degrades

Initially, queries return in seconds or less. But as your data grows, the same searches can take minutes in production. Without careful tuning and constant optimization, query performance becomes increasingly unpredictable. You'll find your team spending more time optimizing queries than actually using them to solve problems.

Operational overhead becomes crushing

Managing a ClickHouse cluster at scale is a specialized skill. You'll need to handle:

Rebalancing: As data distribution becomes uneven, you'll need to manually trigger rebalancing operations.
Storage management: ClickHouse's merge tree storage engine requires careful tuning of merge settings, partitioning schemes, and TTL rules.
Resource allocation: Without proper isolation, resource-intensive queries can impact overall system performance.
Schema evolution: Adding, removing, or modifying columns requires careful planning and execution.
Muliti-tenancy: ClickHouse has limited built-in multi-tenancy capabilities, which creates additional complexity

While it’s an extreme case, a team at a large e-commerce platform shared with us that they had six full-time engineers dedicated just to keeping their ClickHouse logging infrastructure operational. Engineer’s time can be an enormous hidden cost that rarely factors into initial calculations.

Reliability issues emerge

At scale, reliability becomes a serious concern:

Ingestion pipeline failures: As log volumes spike, ingestion often falls behind, as ingestion and search are not separated, as they are in Bronto.
Complex failure modes: Troubleshooting ClickHouse problems requires deep database expertise that most DevOps teams don't have.
Recovery complexity: Restoring from failures or data corruption can be time-consuming and error-prone.

‍

The DIY trap

When engineers suggest "just use ClickHouse for logging," they're often underestimating what that actually means. ClickHouse isn't a logging platform – it's a powerful but complex analytical database that requires substantial expertise to turn into a production-ready observability solution.

The ClickHouse learning curve

Unlike traditional databases that most engineers know, ClickHouse has its own unique concepts and requirements. Here’s a few examples:

MergeTree engines and storage optimization - Understanding when to use ReplicatedMergeTree vs. ReplacingMergeTree vs. dozens of other variants
Partitioning and primary key design - Critical decisions that affect query performance and can't easily be changed later
Merge behavior tuning - Configuring background processes that determine how your data gets organized and compressed

This isn't general database knowledge – it's ClickHouse-specific expertise that takes months to develop.

Building the missing pieces

Even with ClickHouse expertise, you still need to build an entire logging platform around it:

Ingestion pipelines - Handling backpressure, parsing multiple log formats, and dealing with schema evolution
Query interfaces - Building APIs and UIs that let users actually search and analyze their logs
Access control - Implementing multi-tenancy and permissions (remember, ClickHouse's multi-tenancy is limited)
Alerting and monitoring - Creating systems to notify teams when issues occur
Operational tooling - Backup/recovery, monitoring cluster health, and handling schema migrations

The hidden costs

What starts as "let's save money by using open source" often becomes:

Months of engineering time just to get basic functionality working
Ongoing operational overhead that scales with your data volume
The opportunity cost of engineers focusing on infrastructure instead of product features
The risk of outages due to misconfiguration or lack of operational expertise

For most teams, the total cost of ownership of a DIY ClickHouse solution exceeds purpose-built logging platforms.

‍

Bronto: Built for logs, not analytics

When we designed Bronto, we started with a fundamentally different approach. Instead of adapting an analytics database for logging, we built a system specifically for the messy, unpredictable nature of log data.

Separation of compute and storage: Bronto decouples storage from compute, allowing each to scale independently. Your searches remain lightning-fast regardless of data volume.
Indexless architecture with bloom filters: We use bloom filters to achieve sub-second search performance without the overhead and complexity of traditional indexing. This approach is particularly effective for high-cardinality fields common in logs.
Schema-agnostic ingestion: Bronto adapts to your logs, not the other way around. New fields, changing structures, and varying formats are all handled automatically without schema migrations or pipeline adjustments.
Optimized for search patterns: Our architecture is specifically designed for the mixed query patterns common in log analysis: full-text search, pattern matching, filtering, and aggregation.

Unlike the compromises and tradeoffs inherent in repurposing ClickHouse for logging, Bronto delivers:

Sub-second search on terabytes of data: Even complex queries across massive datasets return in milliseconds.
Seconds on petabytes: Where traditional solutions timeout or require rehydration from cold storage, Bronto delivers results in seconds.
12-month retention by default: No more difficult tradeoffs between retention and cost.
No operational overhead: As a true SaaS solution, Bronto eliminates the need for cluster management, tuning, or optimization.
Cost predictability: Our pricing model is transparent and aligned with the value you derive, not punitive for ingesting more data.

‍

The outcome: No more tradeoffs

The logging industry has conditioned us to accept impossible tradeoffs: coverage vs. cost, performance vs. retention, usability vs. flexibility. With purpose-built solutions like Bronto, these tradeoffs simply disappear.

You can have:

Complete log coverage across all your systems
Lightning-fast search regardless of data volume
Extended retention without prohibitive costs
Zero operational overhead

This fundamentally changes what logs are for.

When you can instantly search across years of data, logs shift from a compliance checkbox or last-resort troubleshooting tool to an active part of your operational toolkit.

Engineers can explore patterns over time, identify subtle correlations, and gain insights that were previously impossible when limited by 7-day retention or painfully slow queries. Security teams can investigate incidents spanning months without complex data rehydration. Product teams can analyze user behavior patterns across extended periods. Business teams can track API performance trends, analyze customer usage patterns month-over-month, and identify seasonal variations in service adoption—all with the granular detail that logs provide rather than pre-aggregated metrics.

‍

Conclusion

ClickHouse is an impressive technology for what it was designed to do: structured data analytics at scale. But turning it into a general-purpose logging solution requires enormous engineering investment and still results in fundamental compromises.

Instead of spending years rebuilding what already exists, consider whether your team's time is better spent solving your core business problems. The logging layer should be a foundation that just works, not a perpetual engineering project.

Find out more?

Our AI features

Customer Success Story

Compare to others

Book a Demo