Prometheus is a powerful and flexible tool for observability and monitoring, with the rate() function standing out as a key feature for tracking system behavior over time.
This guide will walk you through how to use the rate() function effectively, covering its mechanics, use cases, and best practices to boost your monitoring and observability efforts, helping you build more reliable systems.
Understanding the Prometheus Rate Function
The rate()
function is a key component of PromQL (Prometheus Query Language) used for analyzing the rate of change in counter metrics over time. At its core, rate()
calculates the per-second average rate of increase of time series in a range vector.
rate()
helps answer critical questions such as:
- "What is the current request rate for a service?"
- "How rapidly is CPU usage increasing over a given period?"
- "What's the rate of errors occurring in our application?"
These insights help in understanding system performance and behavior.
How Scrape Intervals Influence Your Monitoring Setup
The scrape interval in Prometheus defines how frequently metrics are collected. This interval plays a crucial role in how the rate() function works and affects your results.
When using rate(counter[time_range]), Prometheus recommends that your time range should be at least 4 times the scrape interval. This ensures enough data points for accurate calculations while smoothing out irregularities.
For example, if your scrape interval is 15 seconds, your time range should be at least 1 minute (rate(counter[1m])). For metrics scraped every minute, consider using a 5-minute range (rate(counter[5m])).
Inadequate time ranges relative to scrape intervals can lead to:
- Noisy and spiky graphs
- Misleading trends
- Missed counter resets
The Importance of the Rate Function in Monitoring
Understanding rates of change is crucial in monitoring systems for a few key reasons:
- Performance Monitoring: It helps identify sudden spikes or drops in system performance, allowing for quick detection of anomalies or potential issues.
- Capacity Planning: Analyzing trends in rates allows for predicting future resource needs and planning for scaling accordingly.
- Alerting: Rate-based alerts can catch issues before they become critical, enabling proactive problem-solving.
- SLO/SLA Tracking: Rates are often key components of service-level objectives and agreements, making them crucial for ensuring compliance and maintaining service quality.
- Trend Analysis: Rates provide valuable insights into long-term trends, helping in strategic decision-making and system optimization.
Using the Rate Function: Syntax and Basic Examples
The basic syntax of the rate() function is straightforward:
rate(metric[time_range])
Example:
Assume there's a counter metric, http_requests_total
, which tracks the total number of HTTP requests to a service.
To calculate the rate of requests per second over the last 5 minutes, use the following query:
rate(http_requests_total[5m])
This query returns the per-second rate of increase for http_requests_total
over the last 5 minutes.
Note: It's crucial to use rate()
only counter-metrics. Applying it to gauge metrics will result in incorrect data.
The Two-Sample Minimum Requirement
The rate() function requires at least two data points within the specified time range to calculate a rate. This is a fundamental requirement because rate calculation is based on the change between points over time.
If your time range is too small relative to your scrape interval, or if there are gaps in your data collection, you may see empty spots in your graphs where rate() couldn't calculate values.
This requirement becomes especially important when:
- Working with sparse metrics
- Handling service outages or collection gaps
- Debugging missing data in dashboards

Rate vs. Irate: When to Use Each
Prometheus offers two functions for calculating rates from counter metrics, each with distinct purposes:
rate()
: Calculates the per-second average rate over the entire time range, smoothing out fluctuations.irate()
: Calculates the instantaneous rate using only the last two data points, capturing rapid changes.
Choosing the Right Function:
- Use
rate()
for:- Dashboards and trend analysis
- Capacity planning
- Stable metrics where consistency matters
- Alerting on sustained issues
- Use
irate()
for:- Highly variable metrics
- Detecting sudden spikes
- Real-time operational monitoring
- When you need to see rapid changes
# Stable rate over 5 minutes
rate(http_requests_total[5m])
# Instant rate, captures rapid changes
irate(http_requests_total[5m])
The key difference is that rate()
averages across your entire time range, providing stability at the cost of potentially masking brief spikes, while irate()
is more responsive but produces noisier visualizations.
When to Use rate() vs. irate()
While both functions handle counter metrics, they serve different monitoring purposes:
Choosing the Right Function:
Function | Purpose | Use When |
---|---|---|
rate() | Calculates per-second average rate over the entire time range | You need stable trends and want to smooth out spikes |
irate() | Calculates instantaneous rate using just the last two samples | You need to detect |

Advanced Usage: Combining rate()
with Other Functions
The true potential of the rate()
function in Prometheus comes when it’s combined with other PromQL functions.
Below are a few advanced techniques to enhance your metrics analysis:
1. Calculating Request Error Rates
To calculate the ratio of 5xx errors to total requests:
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
This gives a clear view of your error rate by comparing 5xx errors to the total number of requests.
2. Using histogram_quantile()
with Rates
To calculate the 95th percentile of request durations over the last 5 minutes:
histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))
This is especially useful for tracking performance SLOs by providing a detailed view of latency.
3. Smoothing Out Spikes with avg_over_time()
To smooth out short-term fluctuations by averaging the request rate over a longer period:
avg_over_time(rate(http_requests_total[5m])[1h:])
This gives the average rate over the last hour, updated every 5 minutes, helping to track longer-term trends.
4. Comparing Rates Across Different Time Ranges
To detect sudden traffic spikes by comparing the short-term and long-term request rates:
rate(http_requests_total[5m]) / rate(http_requests_total[1h]) > 1.2
This query identifies when the short-term rate is 20% higher than the long-term rate.
5. Calculating the Rate of Change for a Gauge Metric
For calculating the rate of change in gauge metrics (like memory usage) using deriv()
:
deriv(process_resident_memory_bytes{job="app"}[1h])
This tracks how quickly memory usage is changing over an hour.

How rate() Extrapolates Data
An important but often overlooked aspect of rate() is how it handles extrapolation. The function doesn't simply divide the total increase by the time range. Instead, it:
- Calculates the rate between the first and last data points in the range
- Extrapolates this rate to a per-second value
- Accounts for counter resets within the range
This extrapolation behavior means that if your traffic pattern changes dramatically just after the start of your time range, rate() will still consider the older data points in its calculation, potentially masking recent changes.
Understanding this behavior is crucial when interpreting graphs and setting appropriate time ranges for your specific monitoring needs.
Advanced Applications of rate()
Beyond basic monitoring, the rate()
function enables sophisticated observability patterns that can transform your Prometheus metrics into actionable insights.
Common Advanced Patterns
Pattern | Example | Use Case |
---|---|---|
Multi-instance Aggregation | sum(rate(http_requests_total{job="api"}[5m])) |
Measure total request rate across all API servers |
Error Percentage Calculation | sum(rate(http_requests_total{status="error"}[5m])) / sum(rate(http_requests_total[5m])) * 100 |
Track error rates as percentage of total traffic |
Top Consumer Identification | topk(5, sum by(user) (rate(api_requests_total[10m]))) |
Find users generating the most traffic |
Historical Comparison | rate(http_requests_total[5m]) / rate(http_requests_total[5m] offset 1d) |
Compare current traffic to same time yesterday |
These patterns form the building blocks for comprehensive monitoring systems that can detect anomalies, track performance regressions, and identify optimization opportunities in your infrastructure and applications.

5 Common Pitfalls and How to Avoid Them
When working with the rate()
function, keep an eye on these common issues:
- Too Short Time Ranges: Using a time range that's shorter than the scrape interval can result in inaccurate data. A good rule of thumb is to use a time range at least 4x the scrape interval. For instance, if your scrape interval is 15 seconds, set the time range in your
rate()
function to at least 1 minute. - Ignoring Counter Resets: The
rate()
function can handle counter resets (e.g., when a service restarts), but these resets can still cause temporary spikes. Be mindful of this when interpreting data, as the rate is calculated from the available information. - Misunderstanding Aggregation: Since
rate()
returns a per-second value, summing or averaging these rates won’t give an accurate total. Instead, sum the underlying counters and then apply therate()
function. - Inappropriate Use with Gauges: The
rate()
function is specifically for counters. Using it with gauge metrics will yield incorrect results, so avoid this combination. - Neglecting Label Changes: Frequent changes in label values can create gaps in your data, leading to inaccurate rate calculations. Always account for potential label changes when working with metrics.
Practical Example: Monitoring API Request Rates
Let's consider a scenario where you need to monitor request rates for different API endpoints. Here's how it can be set up:
- Instrument the API: Expose a counter metric,
api_requests_total
, with labels for theendpoint
andmethod
. - Create a Grafana Dashboard: Use the following PromQL query to visualize the request rates:
sum by (endpoint) (rate(api_requests_total{job="api-server"}[5m]))
This query calculates the request rate for each endpoint over the last 5 minutes and sums up the rates for all methods.
- Set Up Alerts: Trigger an alert if the request rate for any endpoint exceeds 100 requests per second:
sum by (endpoint) (rate(api_requests_total{job="api-server"}[5m])) > 100
This setup provides a real-time view of API usage patterns, enabling quick identification of bottlenecks and helping optimize heavily used endpoints.
Best Practices for Using the rate()
Function
To make the most of the rate()
function in Prometheus, keep these best practices in mind:
- Choose the Right Time Range: Ensure the time range is long enough to capture meaningful trends while remaining short enough to respond to real-time changes.
- Alerting with Care: When using
rate()
in alerts, opt for longer time ranges to reduce the risk of false positives caused by short-term fluctuations. - Use Other Functions: As highlighted in advanced examples, combining
rate()
with other PromQL functions can provide richer insights into your data. - Know Your Metrics: Understand the nature of your data—such as how often metrics update and their variability—to ensure accurate monitoring.
- Test Thoroughly: Validate your queries with historical data to ensure they're reliable under different conditions and scenarios.
Best Practices for Selecting Time Ranges
Choosing the right time range for rate() involves balancing between smoothing and responsiveness:
- For high-frequency metrics (scraped every 15s), use 1-2 minutes for alerting and 5-10 minutes for dashboards
- For standard metrics (scraped every minute), use 5 minutes for alerting and 10-30 minutes for dashboards
- For slow-changing metrics, longer ranges (up to several hours) may be appropriate
Tips for selecting the optimal range:
- Start with 4x your scrape interval as a minimum
- Increase the range if graphs appear too noisy
- Decrease the range if you need to detect changes more quickly
- Consider using different ranges for alerting vs. dashboarding
- Test different ranges under various load conditions to find what works best
Remember that longer ranges provide more stability but mask short-term changes, while shorter ranges show more detail but may introduce noise.

Conclusion
The Prometheus rate()
function is a fundamental tool for monitoring, offering versatility from tracking request rates to analyzing performance and error metrics. Its strength lies in revealing the rate of change across various metrics, making it vital for any observability strategy.
FAQs
Q: How does the Prometheus rate() function differ from increase()?
A: While rate()
calculates the per-second average rate of increase, increase()
calculates the total increase in the counter's value over the time range. rate()
is generally more useful for ongoing monitoring, while increase()
can help understand total change over a specific period.
Q: How do you calculate request rates using the Prometheus rate function?
A: To calculate request rates, use a query like rate(http_requests_total[5m])
. This will give the per-second rate of requests over the last 5 minutes. These rates can be summed or grouped as needed, e.g., sum(rate(http_requests_total[5m]))
for the total request rate across all instances.
Q: Are there approaches for capturing spikes with PromQL?
A: Yes, max_over_time()
can be used with rate()
to capture spikes. For example, max_over_time(rate(http_requests_total[5m])[1h:])
will show the maximum rate observed in 5-minute windows over the last hour.
Q: How do you calculate the increase of a counter over time using Prometheus functions?
A: To calculate the total increase of a counter, use the increase()
function. For example, increase(http_requests_total[1h])
will show the total increase in the number of requests over the last hour.
Q: Can rate() be used with all types of Prometheus metrics?
A: No, rate()
should only be used with counter-metrics. It doesn't make sense to use rate()
with gauge metrics, as they don't represent cumulative values.