Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

Apr 24th, ‘25 / 10 min read

Why Grafana's Rate Function Is Your Dashboard's Best Kept Secret

Grafana’s rate() function helps you make sense of noisy metrics, spot trends faster, and build dashboards that tell a clearer story.

Why Grafana's Rate Function Is Your Dashboard's Best Kept Secret

What happens when your counter metrics just keep climbing up and up? You're left wondering what those numbers mean right now. Counter metrics in Grafana can be confusing — they just keep going up until something resets them. That's where the rate function comes in to save the day.

Consider of rate as your dashboard's secret weapon. It takes those ever-increasing counter numbers and transforms them into something you can use: how fast things are happening right now.

In this guide, we'll break down everything you need to know about the Grafana rate function - from basic concepts to advanced applications that will level up your monitoring game.

What Is the Grafana Rate Function?

The rate function in Grafana calculates how quickly a counter metric increases over time. Think of it like your car's speedometer - instead of just showing total miles driven (the counter), it shows how fast you're currently going (the rate).

In technical terms, the rate function takes a counter metric and converts it to a per-second rate of change. This transformation is essential for properly analyzing metrics like:

  • HTTP requests per second
  • CPU utilization rates
  • Network traffic throughput
  • Error occurrences
  • Disk write operations

Without the rate function, counter metrics would just keep climbing upward, making it nearly impossible to spot performance issues or unusual patterns.

💡
If you're setting up Grafana with Prometheus, this guide on getting them to work together covers the essentials.

Why Use Rate Instead of Raw Counter Values?

Raw counter values keep increasing until they reset (usually when a service restarts or hits its maximum value). This creates several problems:

  1. Difficult pattern recognition: Constantly increasing lines make it hard to spot anomalies
  2. Scale issues: The y-axis must constantly expand as values grow
  3. Reset blindness: Counter resets create sudden drops that can mask real issues

The rate function solves these problems by showing you the change in values, which is typically what you actually care about. For example, knowing you've served 10 million requests since your app started isn't as useful as knowing you're currently handling 200 requests per second.

Rate Function Syntax in Grafana

Getting started with the rate function is straightforward. Here's the basic syntax to use in your Grafana queries:

PromQL (Prometheus)

rate(metric_name[time_range])

Example:

rate(http_requests_total{job="api-server"}[5m])

This gives you the per-second rate calculated over the last 5 minutes.

💡
If you're looking to improve your PromQL queries further, check out this guide on some helpful tricks: PromQL Tricks You Should Know.

Graphite

rate(metric.path)

InfluxDB (Flux)

from(bucket: "metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "http_requests_total")
  |> derivative(unit: 1s)

Notice that in InfluxDB, you use the derivative function to achieve what rate does in Prometheus.

💡
To make your Grafana dashboards more flexible and easier to explore, check out this guide on Grafana variables.

Common Mistakes When Using the Rate Function

Even experienced engineers sometimes stumble with rate calculations. Here are the top pitfalls to avoid:

1. Using Too Small a Time Window

# Too small - produces noisy graphs
rate(http_requests_total[10s])

# Better - smoother trend line
rate(http_requests_total[2m])

A tiny time window makes your graph jumpy and hard to interpret. As a rule of thumb, use a window at least 2-4x your scrape interval.

2. Ignoring Counter Resets

Counter metrics occasionally reset to zero (like when a service restarts). The rate function handles this gracefully, but if you try to calculate rates manually, you might get huge negative spikes.

3. Applying Rate to Gauge Metrics

The rate function is designed specifically for counter metrics. Applying it to gauge metrics (like CPU percentage or memory usage) will give you meaningless results.

4. Missing Labels in Grouping

When using rate with aggregations, remember to include all relevant labels:

# Might drop data during counter resets
sum(rate(http_requests_total[5m]))

# Better approach
sum by (instance) (rate(http_requests_total[5m]))
💡
Now, fix production Grafana dashboard issues instantly—right from your IDE, with AI and Last9 MCP.

Advanced Rate Function Techniques

Once you're comfortable with the basics, these advanced techniques will take your monitoring to the next level.

irate vs. rate: When to Use Each

Prometheus offers a cousin to rate called irate:

irate(http_requests_total[5m])

While rate calculates the average rate over the time range, irate uses only the last two data points. This makes irate:

  • More responsive to sudden changes
  • Better for alerting on spikes
  • More susceptible to noise

Use irate for alerting and detecting sudden changes, and rate for dashboards and trend analysis.

Combining Rate With Other Functions

The real power comes when you combine rate with other functions:

Calculate request error percentage:

sum(rate(http_requests_total{status="500"}[5m])) / sum(rate(http_requests_total[5m])) * 100

Find 95th percentile request rate by endpoint:

quantile_over_time(0.95, rate(http_requests_total{endpoint=~"/api/.*"}[5m]))[1h:]

Using Rate for Capacity Planning

Rate functions shine for capacity planning. Let's say you want to predict when you'll hit resource limits:

# Predict when CPU will hit 100% based on growth rate
predict_linear(rate(cpu_usage_total[6h]), 24 * 3600)

This forecasts CPU usage 24 hours into the future based on the last 6 hours of data.

💡
If you're working with Grafana in a containerized setup, this guide on Grafana and Docker might come in handy. It covers practical setup tips and helps avoid common missteps.

Real-World Use Cases for the Rate Function

Let's look at practical applications of the rate function in common monitoring scenarios.

Monitoring API Performance

For a typical REST API, you might create a dashboard with these queries:

  1. Request rate by endpoint:
sum by (endpoint) (rate(api_requests_total[5m]))
  1. Error rate:
sum(rate(api_requests_total{status=~"5.."}[5m])) / sum(rate(api_requests_total[5m]))
  1. Latency increase rate:
rate(api_request_duration_seconds_sum[5m]) / rate(api_request_duration_count[5m])

Database Query Monitoring

For database performance:

  1. Query execution rate:
rate(database_queries_total[5m])
  1. Slow query rate:
rate(database_queries_total{duration_ms>100}[5m])
  1. Connection utilization:
sum(database_connections_active) / sum(database_connections_max)

Infrastructure Monitoring

For system-level metrics:

  1. CPU utilization rate:
rate(node_cpu_seconds_total{mode!="idle"}[2m])
  1. Disk I/O operations:
rate(node_disk_reads_completed_total[5m]) + rate(node_disk_writes_completed_total[5m])
  1. Network traffic growth:
sum by (instance) (rate(node_network_transmit_bytes_total[10m]))
💡
Need more control over your Grafana setup? This guide to the Grafana API walks through how to automate dashboards and alerts with ease.

Troubleshooting Rate Function Issues

When your rate calculations don't look right, check for these common issues:

Missing Data Points

Gaps in your time series can cause rate calculations to return unexpected results or no data. Ensure your metrics collection is reliable and consider using a longer time window to bridge small gaps.

Counter Resets Not Handled Properly

Counter resets can cause spikes or dips in your rate graphs. If you're seeing these:

  1. Check if you're using rate for counter metrics and not for gauges
  2. Verify your time window is appropriate (too small can exaggerate reset effects)
  3. Consider using counter reset functions specifically designed for your data source

Scaling Issues

Sometimes rate calculations can produce values that are too small to be readable. In these cases, scale your results:

# Convert bytes/second to MB/second
rate(network_bytes_total[5m]) / (1024 * 1024)

Best Monitoring Tools for Rate-Based Metrics

While we're focusing on Grafana's rate function, several tools work exceptionally well with rate-based metrics:

Last9

Looking for a managed observability solution that balances performance with cost? Last9 might be just what you need. Our platform handles high-cardinality data with ease and integrates smoothly with OpenTelemetry and Prometheus, giving you a unified view of your metrics, logs, and traces.

Plus, with Last9 MCP, you can seamlessly bring real-time production context — logs, metrics, and traces — into your local environment to auto-fix code faster. We keep costs predictable with event-based pricing, so you’re never caught off guard.

Prometheus

The gold standard for time series metrics with native support for powerful rate functions. Prometheus excels at real-time monitoring with a pull-based architecture and a flexible query language (PromQL) that makes calculating rates straightforward. Its open-source nature and active community have made it the backbone of many modern monitoring stacks, especially in Kubernetes environments.

Grafana Mimir

When you need to scale your rate-based metrics to massive volumes, Mimir offers horizontally scalable, highly available, multi-tenant storage. Developed by Grafana Labs, Mimir can handle billions of active series while maintaining query performance, making it ideal for large enterprises tracking rate metrics across multiple teams and applications.

Graphite

A seasoned player in the monitoring space, Graphite provides its own approach to rate calculations with functions like nonNegativeDerivative(). Its strength lies in long-term data retention and aggregation, with a time-tested whisper database format that efficiently compresses historical rate data. Many teams appreciate its simple architecture and established reliability.

InfluxDB

This purpose-built time series database offers powerful derivative functions that accomplish similar results to Prometheus' rate function. InfluxDB's Flux language gives you fine-grained control over how rates are calculated, with built-in capabilities for handling irregular intervals and gaps in data. Its clustered architecture makes it suitable for high-throughput rate monitoring at scale.

Probo Cuts Monitoring Costs by 90% with Last9
Probo Cuts Monitoring Costs by 90% with Last9

How to Optimize Dashboards Using Rate Functions

Your dashboards will be more valuable with these rate function optimization tips:

1. Layer Multiple Rate Functions

Instead of separate panels, overlay related rate metrics on the same graph for better correlation:

rate(successful_requests[5m])
rate(failed_requests[5m])

2. Use Template Variables for Flexible Time Windows

Create a template variable for the rate time window:

rate(http_requests_total[$rate_window])

This lets dashboard users adjust the smoothness of the graphs based on their needs.

Combine rate with moving averages to spot trends more easily:

avg_over_time(rate(http_requests_total[5m])[1h:5m])

This shows the average rate calculated on a sliding 1-hour window, sampled every 5 minutes.

When Not to Use the Rate Function

Despite its usefulness, the rate function isn't right for every situation:

  1. Gauge metrics - For values that can go up and down naturally (like memory usage), use the raw values instead
  2. Low-frequency events - For rare events that happen less than once per time window, the rate will often return zero
  3. When total counts matter - Sometimes you specifically want to know the total count, not the rate (like total orders processed)

In these cases, consider alternatives like gauge metrics, increase functions, or cumulative sums.

Wrapping Up

The rate function might seem like a small piece of the monitoring puzzle, but mastering it will dramatically improve how you visualize and understand your systems' behavior.

💡
If you'd like to continue the conversation further or love talking about observability, metrics, and monitoring, join our Discord community to connect with other DevOps engineers and SREs.

FAQs

What does rate() do in Grafana?

The rate() function in Grafana calculates how fast a counter metric is increasing per second. It transforms counter metrics (which only increase over time) into more useful per-second rates that show the current pace of activity, making it easier to spot trends and anomalies.

What does the rate function do in Prometheus?

In Prometheus, the rate function calculates the per-second average rate of increase of a time series within a specified time window. It automatically handles counter resets and extrapolates based on the available data points, providing a smooth representation of how quickly a metric is changing.

What is the difference between rate and count in Grafana?

Rate shows how quickly a counter is increasing per second, while count (or raw counter metrics) shows the total accumulated value. For example, "rate(http_requests_total[5m])" shows requests per second, while the raw "http_requests_total" shows the total number of requests since the counter started.

What does =~ mean in Prometheus?

The =~ operator in Prometheus is a regular expression matcher. It's used to filter time series based on label values using regex patterns. For example, http_requests_total{path=~"/api/.*"} matches all request metrics where the path label starts with "/api/".

What is the difference between rate and increase in Prometheus?

Rate calculates the per-second increase, while increase calculates the total increase over the specified time period. For instance, rate(requests[1h]) shows requests/second, while increase(requests[1h]) shows the total number of new requests in that hour. Both handle counter resets, but they present the data in different units.

What are Prometheus Functions?

Prometheus functions are operators in PromQL (Prometheus Query Language) that transform or aggregate time series data. They include rate, sum, avg, max, histogram_quantile, and dozens more. These functions allow you to perform calculations, transformations, and aggregations on your metrics to extract meaningful insights.

How does prometheus rate work with grafana?

Grafana acts as a visualization layer for Prometheus data. When you use a rate() function in a Grafana dashboard, it sends the PromQL query to your Prometheus server, which calculates the rates and returns the results. Grafana then renders these rate values as graphs, making it easy to visualize how metrics change over time.

What is a Time Series Metric?

A time series metric is a sequence of data points collected and ordered by time. Each point typically consists of a timestamp and a value. In monitoring contexts, time series metrics track system behaviors over time, such as CPU usage, request rates, or error counts, allowing you to observe patterns and trends.

Are there approaches for capturing spikes with PromQL?

Yes, several approaches can help capture spikes:

  1. Use irate() instead of rate() for more sensitivity to recent changes
  2. Apply max_over_time() to capture the highest values
  3. Use deriv() to identify rapid changes in gauge metrics
  4. Create recording rules that track rates at higher frequencies
  5. Implement percentile-based alerting using histogram_quantile()

Is there a way to adjust the PromQL query so it returns the correct data based on the selected time range?

Yes, you can use the $__interval and $__range variables in Grafana to dynamically adjust your PromQL queries based on the selected time range:

rate(http_requests_total[$__range])

Or for more stable results:

rate(http_requests_total[${__range_s}/4s])

These variables ensure your rate calculations use appropriate time windows as users zoom in or out on dashboards.

Why am I not getting any data when using rate() function in Grafana?

Common reasons include:

  1. The time range selected is too small (shorter than your scrape interval)
  2. Your counter metric hasn't had enough data points yet
  3. The metric name or label selectors might be incorrect
  4. The underlying counter might have just reset
  5. There might be no data in the selected time period Try increasing your time window or checking the raw counter values first to troubleshoot.

How do I use the rate function in Grafana to calculate metrics per second?

To calculate metrics per second in Grafana:

  1. In your Prometheus data source panel, enter a query using rate()
  2. Specify an appropriate time window: rate(metric_name[5m])
  3. The result is automatically presented as per-second values
  4. For metrics per minute, multiply by 60: rate(metric_name[5m]) * 60

How can I use the rate function in Grafana to monitor data over time?

To effectively monitor data over time:

  1. Create time series panels in your Grafana dashboard
  2. Use rate() for counter metrics that track events or activities
  3. Add comparison periods to spot changes (this week vs. last week)
  4. Set up alerts based on rate thresholds
  5. Use template variables to allow switching between different services or instances
  6. Group related rate metrics in the same panel for correlation

How can I use the rate() function to analyze time-series data in Grafana?

To analyze time-series data:

  1. Apply rate() to counter metrics to see their change velocity
  2. Compare rates across different services by using multiple queries on one graph
  3. Calculate ratios between related rates (like errors/total requests)
  4. Use transformations to aggregate rate data by time periods
  5. Create heatmaps of rate distributions to identify patterns
  6. Correlate rate changes with system events using annotations This transforms raw counters into actionable insights about system behavior.

Contents


Newsletter

Stay updated on the latest from Last9.

Authors
Anjali Udasi

Anjali Udasi

Helping to make the tech a little less intimidating. I love breaking down complex concepts into easy-to-understand terms.

Topics