Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

How Streaming Aggregation Works

Understand how streaming aggregation makes high-cardinality data easier to work with—preserving detail, reducing query load, and controlling costs.

May 28th, ‘25
See How Last9 Works

Unified observability for all your telemetry.Open standards. Simple pricing.

Talk to us

In our last post, we looked at how high cardinality adds depth to your metrics. Dimensions like customer_id, endpoint, or region carry meaning and help answer who is affected, what went wrong, and where it’s happening.

But the more context you have, the more time series your system needs to track. That can slow things down — queries take longer, storage grows, and compute usage spikes. The issue isn’t the richness of the data. It’s when and how that data gets processed.

That’s where streaming aggregation helps. By processing metrics up front, you avoid overloading your database and keep query times predictable.

How Traditional Querying Slows Down Your Dashboards

Ordinarily, as metrics arrive, they’re written as-is to your time series database. When you run a query like:

sum by (type) (metric_name{}[3m])

Here’s what happens behind the scenes:

  • All matching samples are pulled from storage
  • They’re grouped by timestamp and label combinations
  • Aggregation logic runs across the entire dataset — every single time

This process repeats with every dashboard refresh. And as your data volume grows, so does the cost of doing this over and over — millions of time series scanned just to render a single panel.

How Traditional Querying Slows Down Your Dashboards

What Changes When You Enable Streaming Aggregation

Streaming aggregation changes when and how your system processes metrics. Rather than reactively crunching numbers at query time, you define what matters early, closer to ingestion..

That doesn’t mean throwing away detail. It means structuring your metrics to reflect how your team thinks about problems.

Here’s what that could look like:

  • Skip labels like hostname, instance, or pod when you don’t need per-host granularity
  • Precompute PromQL-heavy constructs—like histogram quantiles or error rates
  • Build focused aggregates like:
sum without (instance, pod) (grpc_client_handling_seconds_bucket{})

The result:

  • Faster dashboards — with fewer surprises during high load
  • Lower compute and storage — without compromising on visibility
  • More meaningful metrics — shaped to your team’s use cases and workflows

High cardinality gives you the raw material—streaming aggregation helps you shape it into something useful.

Why Not Just Use Recording Rules?

Recording rules are a useful way to precompute queries, but they run after data is already ingested. That means you’ve still stored the raw, high-cardinality time series — and paid the cost of collecting and retaining them.

Streaming aggregation works earlier in the pipeline. It handles metrics as they arrive, so you can shape data at the source and skip writing unnecessary series altogether.

You can still use recording rules for longer-term summaries or business-level rollups. But if your goal is to reduce ingestion load, speed up queries, and keep your storage lean, streaming aggregation gets you there first.

How Streaming Aggregation Manages Time Series Growth

High-cardinality labels like hostname provide useful context—but not every query needs that level of detail. With streaming aggregation, you can be intentional about which labels you keep and when.

Here’s how dropping just one label (e.g., hostname) can dramatically change the number of time series:

QueryLabels UsedBeforeAfter
Total errors in the last 5 minutescode, hostname1,00010
Average latency of /login requestsle, handler, hostname90,000900
P95 of successful POST /login callsle, handler, hostname, code, method3,600,00036,000

For example:
The average latency query includes le, handler, and hostname. With 15 handlers × 60 histogram buckets × 100 hosts, that’s 90,000 unique series. Drop the hostname label during aggregation, and you bring that down to 900, without losing the overall picture.

Reducing time series helps keep storage in check, but that’s only part of the benefit. It also makes your system more responsive in real time.

Take this common alerting scenario:

sum(increase(http_request_duration_seconds_count{code=~"5.*"}[1m])) by (hostname) > 100

This query hits thousands of series every minute, just to trigger a basic alert. That’s a lot of unnecessary pressure, especially during an incident.

With streaming aggregation, you can compute this logic as the data comes in, storing only the result. Your alert becomes faster and lighter, without compromising accuracy. Beyond which labels to keep, when you aggregate matters, just as much.

Time-Based Aggregation: Tumbling vs. Sliding Windows

Streaming aggregation uses time windows to group metrics as they arrive. The two most common strategies are:

  • Tumbling windows: Fixed, non-overlapping intervals.
    Ideal when you want to group metrics into clean chunks—like “requests per minute.”
    Example: Group metrics into 1-minute buckets — 0:00 to 0:01, 0:01 to 0:02, etc.

    Tumbling windows aggregation

  • Sliding windows: Overlapping intervals that move forward continuously.
    Useful for smoothing out data or triggering alerts based on rolling averages.
    Example: Calculate a rolling average every 10 seconds over the past 1 minute.

    Sliding windows aggregation

With recording rules, aggregation happens after metrics are already stored—so time windows are applied post-ingestion. In contrast, streaming aggregation applies windowing as metrics arrive. This means your window strategy directly shapes what gets stored and how queries perform.

Time windows define when metrics are grouped. For simple counters or gauges, that’s enough. But for histograms, you also need to coordinate how they’re grouped. All parts of a histogram — _bucket, _sum, and _count — must use the same window boundaries to keep your percentiles and averages accurate.

How Histograms Work in Streaming Aggregation

Histograms represent distributions over time, not just single values. And when using streaming aggregation, you need to be deliberate in how you handle them.

Each histogram metric consists of three related time series:

  • <metric>_bucket – counts for each latency bucket
  • <metric>_sum – total sum of all values
  • <metric>_count – total number of observations

To maintain accuracy, all three need to be aggregated together during ingestion. Dropping a label or changing a window on just one of them will throw off queries that rely on a complete view of the distribution.

Let’s say you’re aggregating request latency metrics across all instances:

sum without (instance) (http_request_duration_seconds_bucket{})
sum without (instance) (http_request_duration_seconds_sum{})
sum without (instance) (http_request_duration_seconds_count{})

This keeps the histogram shape intact, so your downstream queries for averages, P95s, or traffic volumes stay correct, without needing reprocessing at query time.

Key Streaming Aggregation Techniques for Cardinality Control

Streaming aggregation offers three practical approaches to structuring high-cardinality metrics:

1. Label Dropping: Removing Unnecessary Detail

Not every label adds meaningful value. Some dimensions—like unique IDs or highly dynamic paths—create huge amounts of series, but rarely help with debugging or alerting.

Streaming aggregation lets you drop these labels at ingest, before they inflate your storage or slow down queries.

Original metric:

http_request_duration_seconds{service="payment", path="/api/v1/transactions/123456", user_id="u-89273", instance="pod-8371"}

After dropping unnecessary labels:

http_request_duration_seconds{service="payment", path="/api/v1/transactions", instance="pod-8371"}

Here, we’ve kept the parts that help answer “is this endpoint slow?” and removed the parts that don’t move the needle for observability. The result? Same business signal—fewer distractions.

2. Label Transformation: Preserving Meaning Without Overload

Sometimes, you do want to keep user or business context — but not at full granularity. That’s where label transformation helps.

Streaming aggregation allows you to group raw values into broader categories in real time, preserving the signal while avoiding unnecessary churn.

Original metric:

api_request_count{customer="acme-corp", endpoint="/users/12345/profile", region="us-west-2a"}

After transformation:

api_request_count{customer_tier="enterprise", endpoint_type="user_profile", region="us-west-2"}

This way, you can still answer questions like:

  • Are enterprise customers seeing more errors?
  • How are user profile endpoints performing across regions?

You keep the insight. You just don’t pay the cost of tracking every user and every path variation individually.

3. Pre-Aggregation: Creating Statistical Summaries Across Dimensions

One of the most effective ways to work with high-cardinality metrics is to pre-aggregate them across the dimensions you care about, right at ingestion.

Original metric (millions of time series):

request_duration_seconds{service="auth", endpoint="/verify", user_id="u-12345", region="us-east-1"}

After pre-aggregation (just a few time series):

request_duration_seconds_avg{service="auth", endpoint="/verify", region="us-east-1"}
request_duration_seconds_p95{service="auth", endpoint="/verify", region="us-east-1"}
request_duration_seconds_count{service="auth", endpoint="/verify", region="us-east-1"}

By computing summaries like average, p95, and count at ingest, you avoid storing user-level detail while still capturing latency trends and volume across key dimensions like service, endpoint, and region.

This approach is especially useful for dashboards and alerts that don’t require full granularity but still need fast, meaningful insights.

Common Pitfalls with Streaming Aggregation and How to Avoid Them

Even experienced teams can run into issues when implementing streaming aggregation. Here are a few common ones — and how to avoid them.

Pitfall 1: Overly Aggressive Aggregation

If you aggressively drop high-cardinality labels (like user_id or instance) during streaming aggregation, you lose the ability to drill down during incidents. You’ll know something is wrong — but not who it affects or where it’s happening.

How to avoid it:
Keep a limited window of raw metrics—either:

  • For a short recent time range (e.g., last 10–15 minutes), or
  • For a sample of traffic (e.g., 1–5% of requests)

This gives you enough granularity for on-demand investigation, without storing everything at full fidelity.

To go further, you can also set up a fallback.

  • Under normal conditions, you aggregate as usual.
  • But if error rates or latency cross a threshold, you temporarily bypass aggregation for specific services or labels, so raw data is preserved in real time for debugging.

This keeps your system efficient day to day, but still gives you the detail you need when things break.

Pitfall 2: Stale Aggregation Rules

Aggregation rules that worked well initially can quickly become outdated as your system evolves. Sticking to static rules risks missing new or shifting high-cardinality dimensions, leading to blind spots in your monitoring.

High cardinality provides rich, detailed insights—but only if your aggregation rules evolve alongside your data.

How to avoid it:
Regularly review and update your aggregation rules. Use alerts or monitoring tools to detect emerging cardinality dimensions early, so you can adjust aggregation before problems grow.

Pitfall 3: Confusing Metric Naming

When you have multiple versions of a metric, like raw, averaged, and percentile values, it’s easy to lose track of what each one means. Teams often end up asking: Which one should I query? or worse, they pick the wrong one during an incident.

How to avoid it:
Adopt a clear, consistent naming convention that makes the metric’s purpose obvious at a glance.

For example:

Instead of this:

http_request_duration_seconds
http_request_duration_seconds_1
http_request_duration_seconds_2

Use something like:

http_request_duration_seconds_raw --unaggregated
http_request_duration_seconds_avg -- average latency
http_request_duration_seconds_p95 -- 95th percentile latency
http_request_duration_seconds_count -- total number of requests

When naming reflects meaning, onboarding gets easier, debugging gets faster, and nobody has to guess what a metric is showing—especially during incidents.

Pitfall 4: Resource Constraints

Streaming aggregation is compute-intensive. As the number of metrics and labels grows, real-time processing can put pressure on your collectors or ingestion pipeline, leading to slowdowns, dropped data, or increased lag.

How to avoid it:
Keep an eye on CPU usage, memory consumption, and processing latency. If aggregation begins to impact performance, consider:

  • Scaling horizontally — run multiple collector instances
  • Splitting responsibilities — handle ingestion and aggregation in separate stages
  • Using dedicated components — offload heavy computations to a standalone service or sidecar

Example:
If your collector is responsible for both scraping and aggregating, it’s more likely to hit resource limits under load. Moving aggregation to a separate service can help maintain reliability without sacrificing detail.

How Last9 Helps You Work with High Cardinality

High-cardinality metrics carry valuable detail—but without structure, they can be expensive to store and slow to query.

With Last9’s PromQL-based Streaming Aggregation, you can define rules to drop labels, transform dimensions, and generate new metrics—right at ingest. No changes needed to your application code or configuration.

This lets you stay ahead of cardinality limits while keeping your metrics usable and lightweight.

Up next: We’ll look at how high cardinality plays out in Prometheus—and how to manage it without changing your current setup.

Contents

Do More with Less

Unlock high cardinality monitoring for your teams.