Time-series databases have become the backbone of modern observability, financial analytics, and IoT systems. But there's a common challenge that can bring even the most robust systems to their knees: high cardinality.
When your database starts tracking millions of unique values across various dimensions, performance doesn't just dip—it can collapse entirely.
Let's understand the technical details of what happens when cardinality spikes and how you can architect your systems to handle it.
High Cardinality: Beyond Just "Many Unique Values"
High cardinality isn't merely about having many unique values in a field—it's about the combinatorial explosion that happens when those values interact with time series data.
In a typical time-series database, you're storing:
- Metrics (measurements like CPU usage or request latency)
- Timestamps (when those measurements occurred)
- Dimensions/Tags (contextual information like host, region, customer ID)
High cardinality occurs when these dimensional values create an enormous number of unique time series. For example, if you track:
- 1,000 servers
- 50 metrics per server
- 10 different service names
- 100 customer IDs
You could potentially end up with 1,000 × 50 × 10 × 100 = 50 million unique time series. Each one requires its own tracking, indexing, and storage considerations.
When RAM Becomes Your Primary Bottleneck
Time-series databases typically maintain in-memory data structures to facilitate rapid queries. As cardinality increases, these structures grow exponentially.
Memory Allocation Patterns Under High Cardinality
When cardinality grows beyond certain thresholds, several memory-intensive components start to struggle:
- Index Structures: B-trees or hash tables mapping tag combinations to time series grow with O(n) space complexity relative to the number of unique series.
- Metadata Overhead: Each unique time series requires metadata (series name, tag set, creation time, etc.) that consumes 200-500 bytes per series before even storing actual data points.
- Caching Layers: Query caches become increasingly ineffective as the working set size exceeds available memory.
Let's look at a memory profile from a production InfluxDB instance:
Component | Memory Usage (10K series) | Memory Usage (1M series) | Memory Usage (100M series) |
---|---|---|---|
TSM Index | 400MB | 6GB | 80GB+ |
Series Metadata | 5MB | 500MB | 50GB |
WAL Cache | 200MB | 1.5GB | 15GB+ |
Query Cache | 100MB | 500MB | Effectively Disabled |
Total | ~700MB | ~8.5GB | 145GB+ |
At 100 million series, you're looking at a system that requires specialized hardware just to load its indices.
The Query Performance Complexity Challenge
High cardinality doesn't just eat memory—it fundamentally changes query execution patterns and complexity.
How Query Plans Become Pathological
When a time-series database executes a query against high-cardinality data, several execution phases become problematic:
- Series Selection Phase: The database must first identify which series match the query's tag predicates. With millions of series, this becomes a massive filtering operation.
- Index Traversal: Even with optimized indices, traversing millions of unique keys takes time. The difference between traversing 10,000 vs. 10 million index entries isn't linear—it's often super-linear due to cache misses and disk I/O.
- Merging Time Series: Aggregating across high-cardinality dimensions requires merging enormous numbers of time series, creating CPU-bound bottlenecks.
Here's a query execution profile showing where time is spent:
EXPLAIN ANALYZE SELECT mean("value") FROM "cpu"
WHERE ("region" = 'us-west' OR "region" = 'us-east')
AND time > now() - 1h
GROUP BY "host", "service", time(5m)
Query execution breakdown:
- Series selection: 2.453s (was 0.021s at low cardinality)
- Index scan: 1.876s (was 0.015s at low cardinality)
- Series read: 0.312s
- Merge and aggregate: 3.208s (was 0.124s at low cardinality)
- Total execution: 7.849s (was 0.167s at low cardinality)
A 47x performance degradation for the same logical query is common when cardinality jumps by 2-3 orders of magnitude.
When Indexing Bottlenecks Write Throughput
While read performance suffers under high cardinality, write performance often degrades even more dramatically.
The Hidden Costs of Each Write Operation
When writing a data point to a high-cardinality time-series database:
- Series Lookup: The database must determine if this is a new series or an existing one.
- Index Updates: If new, indices must be updated—a potentially expensive operation requiring locks.
- Compaction Impact: Background compaction processes, which merge and optimize data files, struggle with fragmented data across millions of series.
This creates a cascading effect on write throughput:
Cardinality Level | Write Throughput | Write Latency | Compaction Backlog |
---|---|---|---|
10,000 series | 500,000 points/s | 0.5ms | Negligible |
1 million series | 100,000 points/s | 2.5ms | Occasional delays |
100 million series | 10,000 points/s | 25ms+ | Chronic, growing |
At extreme cardinalities, you might face a situation where compaction can never catch up with incoming writes—a death spiral for database performance.
Which Time-Series Database Handles Cardinality Best?
Different time-series databases employ fundamentally different architectures to manage cardinality challenges.
InfluxDB's TSI: Moving From Memory to Disk
InfluxDB's evolution shows how critical cardinality management has become:
- InfluxDB 1.x with TSM: Uses an in-memory index that works brilliantly until the ~1 million series threshold.
- InfluxDB 1.x with TSI: Introduced a disk-based Time Series Index that can handle 10-100x more series by moving index structures to disk.
- InfluxDB 2.0/IOx: Completely redesigned around a columnar storage model inspired by Apache Arrow and DataFusion to better handle high-cardinality workloads.
The TSI architecture uses a multi-level index approach:
- Level 1: In-memory mutable index
- Level 2: Memory-mapped immutable index files
- Level 3: Larger compacted index files
This tiered approach allows it to handle much higher cardinality before performance degrades, though disk I/O becomes the new bottleneck.
TimescaleDB's Hypertable Architecture: Using PostgreSQL's Mature Indexing
TimescaleDB takes a different approach by extending PostgreSQL:
- Chunk-based partitioning: Data is automatically partitioned into chunks based on time, making queries on recent data much faster.
- PostgreSQL indexes: Leverages B-tree, hash, and GiST indexes for efficient tag queries.
- Constraint exclusion: Automatically skips chunks that can't match query conditions.
This architecture handles high cardinality differently:
-- TimescaleDB's approach to high-cardinality queries
EXPLAIN (ANALYZE, BUFFERS)
SELECT time_bucket('5 minutes', time) AS five_min,
hostname,
AVG(cpu_usage)
FROM metrics
WHERE time > NOW() - INTERVAL '1 day'
GROUP BY five_min, hostname;
-- Shows chunk exclusion in action:
"Planning Time: 2.051 ms"
"Execution Time: 851.283 ms"
"Chunks excluded during planning: 165"
"Chunks scanned during execution: 12"
By scanning only relevant chunks, TimescaleDB reduces the impact of high cardinality, though it still struggles as cardinality grows beyond millions of series.
Prometheus and VictoriaMetrics: Label-Based vs. Inverted Index
Prometheus and its high-performance alternative VictoriaMetrics take very different approaches:
- Prometheus: Uses a custom TSDB with an in-memory label index. Simple but vulnerable to high cardinality.
- VictoriaMetrics: Uses an advanced inverted index specifically optimized for high-cardinality time-series data.
The performance difference is stark:
Query Type | Prometheus (1M series) | VictoriaMetrics (1M series) | Prometheus (10M series) | VictoriaMetrics (10M series) |
---|---|---|---|---|
Instant Query | 200ms | 50ms | 2s+ | 120ms |
Range Query | 1.5s | 400ms | 15s+ | 800ms |
High-Cardinality Aggregation | 5s+ | 800ms | Failed | 3s |
VictoriaMetrics demonstrates that with the right architecture, high cardinality doesn't have to be catastrophic.
How to Prevent Cardinality Problems with Advanced Data Modeling
The most effective solutions to cardinality problems happen at the data modeling stage, long before data hits your database.
Hierarchical Tag Structures: Aggregating Without Losing Information
Instead of recording every possible dimension as a separate tag, build hierarchical structures:
# Poor design (high cardinality)
measurement,region=us-west-2,az=us-west-2a,rack=r42,host=server123,service=auth cpu_usage=72.5
# Better design (lower cardinality)
measurement,host_path=/us-west-2/us-west-2a/r42/server123,service=auth cpu_usage=72.5
This allows you to query at any level of the hierarchy while keeping the total number of tags manageable.
Controlled Cardinality Through Domain-Specific Normalization
Normalize high-cardinality fields based on domain knowledge:
- IP Addresses: Store subnet rather than individual IPs
- User IDs: Store cohorts, segments, or hash buckets
- URLs: Store route patterns rather than full URLs with parameters
- Session IDs: Create time-bucketed session groups
Implementation example:
-- Instead of storing full URLs
INSERT INTO http_metrics (url, status_code, response_time)
VALUES ('/products/12345?user=789&source=email', 200, 127);
-- Store normalized route patterns
INSERT INTO http_metrics (url_pattern, status_code, response_time)
VALUES ('/products/:id', 200, 127);
-- Still allows powerful aggregation
SELECT url_pattern, AVG(response_time), COUNT(*)
FROM http_metrics
GROUP BY url_pattern;
This reduces cardinality by orders of magnitude while preserving analytical power.
Strategic Downsampling: Reducing Cardinality Over Time
Implement tiered storage with different cardinality levels based on age:
Raw tier (1-7 days): Full cardinality
Intermediate tier (8-30 days): Reduced cardinality (drop some dimensions)
Long-term tier (31+ days): Minimal cardinality (highly aggregated)
Implementation example:
-- Continuous aggregate for downsampling with reduced cardinality
CREATE MATERIALIZED VIEW metrics_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', time) AS hourly,
service,
region,
-- Drop host-level cardinality in this view
AVG(cpu_usage) AS avg_cpu,
MAX(cpu_usage) AS max_cpu,
MIN(cpu_usage) AS min_cpu,
COUNT(*) AS sample_count
FROM metrics
GROUP BY hourly, service, region;
-- Set retention policy for high-cardinality raw data
SELECT add_retention_policy('metrics', INTERVAL '7 days');
This approach gives you full detail for recent data while automatically reducing both storage and cardinality for historical data.
Performance Optimization Techniques When Reducing Cardinality Isn't an Option
Sometimes you simply can't reduce cardinality. Here are advanced techniques to optimize performance anyway:
How to Optimize Resources for Partial Index
Rather than indexing every possible tag combination, create partial indices for the most common query patterns:
-- TimescaleDB/PostgreSQL partial index example
CREATE INDEX ON metrics(host, time DESC)
WHERE service = 'api-gateway';
-- This dramatically speeds up queries for a specific service
-- without wasting resources on indexing everything
Partial indices can reduce index size by 90%+ while still accelerating the most important queries.
Precompute Common Aggregations for Faster Queries
For frequently accessed aggregations across high-cardinality dimensions, create materialized query tables:
-- Create a materialized view that pre-aggregates data
CREATE MATERIALIZED VIEW daily_service_metrics AS
SELECT
DATE_TRUNC('day', time) AS day,
service,
region,
AVG(latency) AS avg_latency,
P95(latency) AS p95_latency,
COUNT(*) AS request_count
FROM request_logs
GROUP BY day, service, region;
-- Create an index on the materialized view
CREATE INDEX ON daily_service_metrics(service, day);
-- Refresh on a schedule
REFRESH MATERIALIZED VIEW daily_service_metrics;
This trades storage space and some write overhead for dramatic query performance improvements on common access patterns.
Rewriting for Cardinality-Awareness
Rewrite queries to limit the impact of high cardinality:
-- Inefficient query (scans all series)
SELECT mean("cpu_usage") FROM "system"
WHERE time > now() - 1h
GROUP BY "host";
-- More efficient query (uses tag value pre-filtering)
SELECT mean("cpu_usage") FROM "system"
WHERE time > now() - 1h
AND "datacenter" = 'us-west'
GROUP BY "host";
Always filter on lower-cardinality tags first to reduce the working set before dealing with high-cardinality dimensions.
Advanced Tooling for Cardinality Management
Modern time-series databases come with specialized tools for managing cardinality issues.
Deep Cardinality Inspection Tools
These tools help you understand exactly where your cardinality is coming from:
# InfluxDB cardinality exploration
influx_inspect cardinality -db mydb -limit 20
# Output:
Tag key Approx. unique values
--------- -------------------
container_id 24,532,850
request_id 18,945,327
pod_name 5,624,103
path 2,345,678
host 124,532
service 86
environment 12
region 8
Continuous Cardinality Monitoring
Set up monitoring for cardinality itself:
-- Query to track cardinality growth in TimescaleDB
SELECT
time_bucket('1 day', created_at) AS day,
COUNT(DISTINCT series_id) AS new_series_count
FROM _timescaledb_catalog.hypertable_replica
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY day
ORDER BY day;
Implement alerts when cardinality grows beyond expected rates to catch problems early.
Automated Cardinality Management
Advanced systems implement automated cardinality controls:
// Example of a cardinality limiter in application code
function addMetric(metric, tags, value) {
// Hash the tags to get a consistent series identifier
const seriesHash = hashTags(tags);
// Check if this hash is in our allow list
if (cardinalityManager.isAllowed(seriesHash)) {
// Send the metric normally
client.writePoint(metric, tags, value);
} else if (cardinalityManager.shouldSample(seriesHash)) {
// This series isn't in our allow list, but we'll sample it
// at a lower rate to track its importance
client.writePoint(metric, tags, value);
// Record that we've seen this series
cardinalityManager.recordSeries(seriesHash, tags);
}
// Otherwise silently drop the point
}
// Periodically adjust the allow list based on observed frequency
setInterval(() => {
cardinalityManager.promoteFrequentSeries();
cardinalityManager.demoteRareSeries();
}, 3600 * 1000); // Every hour
This approach allows systems to automatically maintain cardinality within acceptable limits.
How Time-Series Databases Are Evolving for High Cardinality
Database vendors recognize that high cardinality is a growing challenge and are making architectural changes to address it.
Column-Oriented Storage Models
Newer time-series databases are moving toward columnar storage:
- InfluxDB IOx: Based on Apache Arrow and DataFusion
- TimescaleDB Hyperscale: Adding columnar capabilities to TimescaleDB
- QuestDB: Using column-oriented SIMD-optimized storage
Columnar storage has significant advantages for high-cardinality time-series data:
- Better compression ratios (10-100x for many workloads)
- Improved query performance through vectorized execution
- More efficient filtering across high-cardinality dimensions
Approximate Query Processing
Many use cases don't require exact answers. Approximate query processing can provide dramatic performance improvements:
-- Exact (slow on high cardinality)
SELECT COUNT(DISTINCT user_id) FROM events;
-- Approximate (much faster, slight accuracy tradeoff)
SELECT approx_count_distinct(user_id) FROM events;
Modern time-series databases are adding more approximate algorithms like HyperLogLog for cardinality estimation, T-Digest for percentiles, and Count-Min Sketch for frequency estimation.
Hardware Acceleration
Specialized hardware is becoming more important for high-cardinality workloads:
- GPU acceleration: For parallel processing of filtering operations
- FPGA-based indexing: For ultra-fast series lookup
- Persistent memory: Bridging the gap between RAM and SSD
These advancements may eventually change the economics of high-cardinality time series.
When to Consider Alternative Database Technologies
At extreme cardinality levels, traditional time-series databases may not be the right solution. Consider these alternatives:
Specialized High-Cardinality Time-Series Databases
- VictoriaMetrics: Built from the ground up for high cardinality
- ClickHouse: Column-oriented DBMS with excellent time-series capabilities
- Apache Druid: Designed for sub-second OLAP queries on high-cardinality data
Event Log Oriented Solutions
For some use cases, switching from metrics to logs may be more appropriate:
- Elasticsearch: For high-cardinality text-heavy time-series data
- Loki: Log aggregation system with label-based indexing
- Hadoop/Spark: For batch processing of extreme-cardinality historical data
The decision factors should include:
- Query patterns (real-time vs. batch)
- Retention requirements
- Aggregation needs
- Budget constraints
Conclusion
High cardinality doesn’t have to slow down your time-series database. At Last9, we help teams handle high-cardinality observability at scale—trusted by Disney+ Hotstar, CleverTap, and Replit.
We’ve monitored 11 of the 20 largest live-streaming events in history, integrating with OpenTelemetry and Prometheus to unify metrics, logs, and traces. The result? Better performance, lower costs, and real-time insights for smarter monitoring and alerting.
Book sometime with us if you want to chat about more about it!