Last9 Last9

Mar 7th, ‘25 / 8 min read

High vs Low Cardinality: Is Your Observability Stack Failing?

High cardinality can overwhelm monitoring systems, leading to slow queries and blind spots. Here’s why it matters and how to handle it effectively.

High vs Low Cardinality: Is Your Observability Stack Failing?

Imagine trying to find a friend in a packed stadium with 50,000 people versus spotting them in a quiet coffee shop. That’s the difference between high and low cardinality data. And if you’re working with distributed systems or microservices, this isn’t just a theoretical distinction—it’s a fundamental challenge that can make or break your observability setup.

High cardinality data is often the hidden culprit behind slow queries, overwhelming dashboards, and blind spots in monitoring. When every user, session, or transaction generates unique identifiers, traditional monitoring tools struggle to keep up. The result? The signals you need most get buried in the noise.

So, what’s the way forward? Understanding how cardinality impacts observability isn’t just an optimization exercise—it’s critical for detecting and troubleshooting issues in real time.

Let’s break down why this problem exists, why most solutions fall short, and what an effective approach looks like.

What is Cardinality: Beyond the Textbook Definition

Cardinality refers to the number of unique values a field can contain. Simple enough on paper, mind-blowing in practice:

  • Low cardinality: Few unique values (think HTTP status codes, environment names, boolean flags)
  • Medium cardinality: A moderate number of values (countries, service names, endpoint paths)
  • High cardinality: Massive number of unique values (user IDs, request IDs, session tokens, container IDs)

Here's what this looks like in the wild:

Cardinality Type Example Unique Values Storage Scale Query Performance Cost Impact
Low Environment (prod/stage/dev) 3-5 Kilobytes Milliseconds Negligible
Medium Service Names 50-200 Megabytes Milliseconds-Seconds Low
High Container IDs Millions+ Gigabytes-Terabytes Seconds-Minutes Exponential
Extreme Request IDs + User IDs + Paths Billions+ Petabytes Minutes-Hours/Timeouts Potentially unlimited

The technical reality behind this table? Each unique combination creates a new time series to track – and most monitoring systems simply weren't built for the cardinality explosion we're seeing in modern architectures.

💡
If high cardinality is causing issues in your monitoring, here’s how Last9 tackles the challenge by sharding a stream to keep systems fast and efficient: How We Tame High Cardinality by Sharding a Stream.

How to Stop Your Monitoring from Working Against You

The Mathematics of Monitoring

Let's do the actual math on why cardinality gets out of hand so quickly:

Take a single metric: http_request_duration_seconds. Now add dimensions:

  • Service (50 microservices)
  • Method (GET, POST, PUT, DELETE)
  • Path (100 unique endpoints)
  • Status code (10 possibilities)
  • Region (5 data centers)
  • Customer tier (3 options: free, pro, enterprise)

Potential combinations: 50 × 4 × 100 × 10 × 5 × 3 = 3,000,000 unique time series for ONE metric.

At one data point per minute, that's 3 million writes every 60 seconds. At one data point every 10 seconds (common for critical metrics), you're at 18 million writes per minute.

And we haven't even added the highest cardinality dimensions yet:

  • User ID (millions)
  • Container ID (constantly changing, potentially millions)
  • Session ID (potentially billions)

Add just one of these high cardinality fields, and your 3 million time series becomes 3 billion. This isn't theory – it's the exact scenario bringing monitoring systems to their knees daily.

💡
Handling high cardinality data effectively requires the right approach to storage and querying. Here's why thinking of your observability stack as a data warehouse, not just a database, can make a difference: Think Data Warehouse, Not Database.

The Technical Debt You Didn't Know You Were Accumulating

Here's what happens behind the scenes when cardinality spirals:

  1. Index bloat: Time-series databases use indexes to quickly locate data. High cardinality causes these indexes to grow exponentially, consuming RAM.
  2. Write amplification: Each unique time series requires metadata management overhead, turning what should be simple writes into complex database operations.
  3. Shard imbalance: In distributed databases, high cardinality data often creates "hot spots" where certain shards receive disproportionate traffic.
  4. Garbage collection storms: Java-based monitoring systems can experience massive GC pauses when memory fills with index data.
  5. Cache invalidation: Query caches become effectively useless as the likelihood of repeat queries drops with higher cardinality.

The result isn't just slow queries – it's complete system failure exactly when you need monitoring most.

Why High Cardinality Data Is Both Essential and Impossible

The Debugging Dilemma: Can't Live With It, Can't Troubleshoot Without It

The irony of high cardinality data is that it's precisely what you need for effective troubleshooting:

  • User-specific issues: Without user IDs, how do you track down why one specific customer is experiencing timeouts?
  • Rare edge cases: Many critical bugs only manifest under specific combinations of conditions – combinations that get filtered out when you drop high cardinality dimensions.
  • Performance outliers: The p99 latency that's killing your conversion rate often comes from specific traffic patterns only visible with high cardinality data.
  • Security incidents: Detecting unusual access patterns requires tracking session and IP-level data – both extremely high cardinality.

The Query Performance vs. Data Granularity Tradeoff

Every observability team faces this impossible choice:

  1. Keep everything: Maintain full data fidelity but watch query performance degrade to unusable levels and costs skyrocket.
  2. Aggregate aggressively: Improve query performance but lose the ability to drill down to specific instances when troubleshooting.
  3. Sample randomly: Keep a percentage of data but risk missing the critical events that matter most.
  4. Drop high cardinality dimensions: Query faster but lose context that's essential for root cause analysis.

None of these are acceptable options for modern systems where minutes of downtime can cost millions.

💡
Managing high cardinality data at scale isn't just a technical challenge—it directly impacts reliability. See how Last9 helped ensure observability for 25 million concurrent live-streaming viewers.

Why Traditional Monitoring Tools Break Under Cardinality Pressure

The root of the cardinality problem lies in how time-series databases are architected:

LSM Trees vs. B+ Trees: Storage Engine Limitations

Most time-series databases use either Log-Structured Merge Trees (LSM) or B+ Trees as their underlying storage engines. Neither was designed with extreme cardinality in mind:

  • LSM Trees (used in Cassandra, InfluxDB): Optimize for write performance but struggle with high cardinality because they require frequent compaction operations that become exponentially more expensive.
  • B+ Trees (used in Postgres, MySQL): Provide better query performance but suffer from write amplification and fragmentation under high cardinality loads.

Both approaches hit physical limits when cardinality exceeds certain thresholds, causing:

  • Write throughput collapse
  • Read latency spikes
  • Disk space explosion
  • Memory exhaustion

The Hidden Costs of Indexing Everything

Monitoring systems typically index every label and tag to make queries fast. With high cardinality, these indexes can become larger than the actual data:

  • A 1GB dataset might require 10GB+ of index data with high cardinality dimensions
  • These indexes must be loaded into memory for acceptable query performance
  • As cardinality grows, memory requirements grow beyond what's economically feasible

How Cardinality Breaks Each Component

Metrics:

Metrics are designed for lightweight, regular reporting of system health. High cardinality fundamentally conflicts with this purpose:

  • Storage efficiency: Metrics use efficient formats like 64-bit floats, but this advantage disappears when metadata size exceeds data size.
  • Query patterns: Metric queries typically examine large time ranges across many series, which becomes impossible with millions of unique series.
  • Retention policies: Most metrics systems use downsampling for older data, which becomes computationally prohibitive with high cardinality.

Logs:

Logs naturally contain high cardinality fields, creating unique challenges:

  • Inverted indexes: Search engines like Elasticsearch use inverted indexes that grow with each unique term.
  • Shard strategies: Log systems struggle to distribute high cardinality data evenly across shards.
  • Tokenization overhead: Parsing and indexing high cardinality fields consumes significant CPU.
  • Field limits: Many log systems impose limits on distinct field values (Elasticsearch's default mapping explosion is 1000).

Traces:

Distributed tracing generates inherently high cardinality data:

  • Every request gets a unique trace ID
  • Every service adds span IDs
  • Context propagation adds user and session details

This creates a perfect cardinality storm that few systems can handle at scale.

Probo Cuts Monitoring Costs by 90% with Last9
Probo Cuts Monitoring Costs by 90% with Last9

How Top Companies Actually Handle Cardinality Today

Most sophisticated engineering organizations use a combination of strategies to manage cardinality:

Tagging Governance:

Leading organizations implement strict tagging policies:

  • Cardinality budgets: Teams are allocated a maximum number of unique label combinations
  • Reserved labels: High cardinality fields like user ID can only be used with explicit approval
  • Naming conventions: Standards that prevent redundant or overlapping tags
  • Tag lifecycle management: Automated processes to identify and remove unused tags

Strategic Pre-Aggregation:

Rather than raw data, many teams pre-aggregate along known query dimensions:

  • Query-aligned rollups: Creating pre-aggregated views that match common query patterns
  • Materialized aggregation tables: Computing and storing common aggregates rather than raw data
  • Downsampling pipelines: Processing high-frequency data into lower-resolution formats with reduced cardinality

The Multi-Tier Storage Approach: Different Tools for Different Cardinalities

The most sophisticated setups use tiered approaches:

  • Low cardinality tier: Traditional time-series databases for service-level metrics
  • Medium cardinality tier: Specialized storage for namespace and pod-level metrics
  • High cardinality tier: Sampling-based approach for request-level data
  • Special case handling: Custom solutions for critical high cardinality dimensions

Practical Steps for Managing Cardinality Today

Whether you're using Last9 or other tools, here are concrete steps for managing cardinality:

Implementing a Cardinality Budget System

Define clear limits for your organization:

  1. Set team and service quotas: Allocate cardinality budgets based on criticality
    • Critical services: Higher cardinality allowance
    • Background services: Lower cardinality limits
  2. Create automated enforcement: Scripts that reject new metrics that would exceed budgets

Baseline current cardinality: Use query tools to identify current unique combinations

SELECT count(distinct(concat(label1, label2, label3))) FROM metrics

Building a Dimensional Hierarchy for Intelligent Aggregation

Hierarchically structure your labels:

region > availability_zone > service > instance > endpoint

This allows for progressive drill-down and intelligent aggregation at each level.

Label Normalization Techniques That Preserve Meaning While Reducing Cardinality

Implement transformations that reduce cardinality without losing value:

  • Path templating: Convert /user/123/profile to /user/{id}/profile
  • Bucketing: Group values into ranges (response times: fast/medium/slow)
  • Consistent hashing: For high cardinality values, you need to preserve, but don't need to read directly
  • Fingerprinting: Create stable identifiers for similar error messages

Query Optimization for High Cardinality Workloads

Rewrite queries to work efficiently with high cardinality data:

  • Use time-bound queries instead of unbounded searches
  • Leverage HAVING clauses to filter post-aggregation
  • Apply WHERE clauses on low cardinality fields first
  • Avoid expensive operations like DISTINCT and GROUP BY on high cardinality fields
💡
High cardinality isn’t just a performance challenge—it can also drive up monitoring costs. Here’s why and what you can do about it: Why Your Monitoring Costs Are High.

Future-Proofing Observability

The field is rapidly evolving with new approaches:

Newer technologies treat high cardinality data as vectors in multi-dimensional space:

  • Embedding-based approaches that map similar high cardinality values closer together
  • Approximate nearest neighbor algorithms that find "close enough" matches
  • Dimensionality reduction techniques that preserve relationships while reducing cardinality

Machine Learning for Anomaly-Preserving Compression

ML approaches that can dramatically reduce cardinality while preserving anomalies:

  • Autoencoder models that learn normal patterns and preserve only deviations
  • Clustering algorithms that group similar time series
  • Forecasting models that store only prediction errors instead of raw values

The GraphQL Approach to Observability Queries

Next-generation query interfaces that handle high cardinality more gracefully:

  • Query languages that allow precise specification of only needed dimensions
  • Federated queries that can span multiple storage backends based on cardinality
  • Just-in-time aggregation that adapts to cardinality encountered during query execution

Last9 is actively integrating these cutting-edge approaches into its platform, staying ahead of the cardinality explosion curve.

How to Evaluate Observability Solutions in a High Cardinality Space

When evaluating monitoring and observability tools, ask these critical questions:

  1. How does the cost scale with cardinality? Look for sub-linear cost scaling rather than linear or exponential.
  2. What are the hard cardinality limits? Many tools have undocumented limits that cause failures under load.
  3. How is high cardinality data sampled? Random sampling loses the most important outliers.
  4. Can I query across storage tiers? Many solutions fragment data into separate systems as it ages.
  5. What's the query performance degradation at scale? Tools often benchmark empty systems, not fully loaded ones.

Last9 is designed specifically to excel in these dimensions without any sampling, providing predictable performance and costs even as cardinality grows.

Conclusion

Effectively managing high cardinality data brings several advantages:

  • More accurate performance optimizations
  • Faster resolution of customer-impacting issues
  • Improved security through detailed anomaly detection
  • Deeper visibility into user behavior and system performance

Yet, handling this data at scale can be challenging, often leading to slow dashboards and noisy alerts. We help address this by enabling faster queries, optimized cardinality workflows, and smarter alerting—so high-cardinality data becomes a resource, not a bottleneck.

Book sometime with us to know more or try it for free today!

Contents


Newsletter

Stay updated on the latest from Last9.

Authors
Anjali Udasi

Anjali Udasi

Helping to make the tech a little less intimidating. I love breaking down complex concepts into easy-to-understand terms.