Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

Sep 25th, ‘24 / 10 min read

Prometheus Rate Function: A Practical Guide to Using It

In this guide, we’ll walk you through the Prometheus rate function. You’ll discover how to analyze changes over time and use that information to enhance your monitoring strategy.

Prometheus Rate Function: A Practical Guide to Using It

Prometheus is a powerful and flexible tool for observability and monitoring, with the rate() function standing out as a key feature for tracking system behavior over time.
This guide will walk you through how to use the rate() function effectively, covering its mechanics, use cases, and best practices to boost your monitoring and observability efforts, helping you build more reliable systems.

Understanding the Prometheus Rate Function

The rate() function is a key component of PromQL (Prometheus Query Language) used for analyzing the rate of change in counter metrics over time. At its core, rate() calculates the per-second average rate of increase of time series in a range vector.

rate() helps answer critical questions such as:

  • "What is the current request rate for a service?"
  • "How rapidly is CPU usage increasing over a given period?"
  • "What's the rate of errors occurring in our application?"

These insights help in understanding system performance and behavior.

How Scrape Intervals Influence Your Monitoring Setup

The scrape interval in Prometheus defines how frequently metrics are collected. This interval plays a crucial role in how the rate() function works and affects your results.

When using rate(counter[time_range]), Prometheus recommends that your time range should be at least 4 times the scrape interval. This ensures enough data points for accurate calculations while smoothing out irregularities.

For example, if your scrape interval is 15 seconds, your time range should be at least 1 minute (rate(counter[1m])). For metrics scraped every minute, consider using a 5-minute range (rate(counter[5m])).

Inadequate time ranges relative to scrape intervals can lead to:

  • Noisy and spiky graphs
  • Misleading trends
  • Missed counter resets

The Importance of the Rate Function in Monitoring

Understanding rates of change is crucial in monitoring systems for a few key reasons:

  1. Performance Monitoring: It helps identify sudden spikes or drops in system performance, allowing for quick detection of anomalies or potential issues.
  2. Capacity Planning: Analyzing trends in rates allows for predicting future resource needs and planning for scaling accordingly.
  3. Alerting: Rate-based alerts can catch issues before they become critical, enabling proactive problem-solving.
  4. SLO/SLA Tracking: Rates are often key components of service-level objectives and agreements, making them crucial for ensuring compliance and maintaining service quality.
  5. Trend Analysis: Rates provide valuable insights into long-term trends, helping in strategic decision-making and system optimization.

Using the Rate Function: Syntax and Basic Examples

The basic syntax of the rate() function is straightforward:

rate(metric[time_range])

Example:

Assume there's a counter metric, http_requests_total, which tracks the total number of HTTP requests to a service.

To calculate the rate of requests per second over the last 5 minutes, use the following query:

rate(http_requests_total[5m])

This query returns the per-second rate of increase for http_requests_total over the last 5 minutes.

Note: It's crucial to use rate() only counter-metrics. Applying it to gauge metrics will result in incorrect data.

The Two-Sample Minimum Requirement

The rate() function requires at least two data points within the specified time range to calculate a rate. This is a fundamental requirement because rate calculation is based on the change between points over time.

If your time range is too small relative to your scrape interval, or if there are gaps in your data collection, you may see empty spots in your graphs where rate() couldn't calculate values.

This requirement becomes especially important when:

  • Working with sparse metrics
  • Handling service outages or collection gaps
  • Debugging missing data in dashboards
Prometheus Alternatives: Monitoring Tools You Should Know | Last9
What are the alternatives to Prometheus? A guide to comparing different Prometheus Alternatives.

Rate vs. Irate: When to Use Each

Prometheus offers two functions for calculating rates from counter metrics, each with distinct purposes:

  • rate(): Calculates the per-second average rate over the entire time range, smoothing out fluctuations.
  • irate(): Calculates the instantaneous rate using only the last two data points, capturing rapid changes.

Choosing the Right Function:

  • Use rate() for:
    • Dashboards and trend analysis
    • Capacity planning
    • Stable metrics where consistency matters
    • Alerting on sustained issues
  • Use irate() for:
    • Highly variable metrics
    • Detecting sudden spikes
    • Real-time operational monitoring
    • When you need to see rapid changes
# Stable rate over 5 minutes
rate(http_requests_total[5m])

# Instant rate, captures rapid changes
irate(http_requests_total[5m])

The key difference is that rate() averages across your entire time range, providing stability at the cost of potentially masking brief spikes, while irate() is more responsive but produces noisier visualizations.

When to Use rate() vs. irate()

While both functions handle counter metrics, they serve different monitoring purposes:

Choosing the Right Function:

FunctionPurposeUse When
rate()Calculates per-second average rate over the entire time rangeYou need stable trends and want to smooth out spikes
irate()Calculates instantaneous rate using just the last two samplesYou need to detect
Histogram Buckets in Prometheus Made Simple | Last9
Learn how Prometheus histogram buckets work, why they matter, and how to fine-tune them for better observability and smarter alerting.

Advanced Usage: Combining rate() with Other Functions

The true potential of the rate() function in Prometheus comes when it’s combined with other PromQL functions.

Below are a few advanced techniques to enhance your metrics analysis:

1. Calculating Request Error Rates

To calculate the ratio of 5xx errors to total requests:

sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))

This gives a clear view of your error rate by comparing 5xx errors to the total number of requests.

2. Using histogram_quantile() with Rates

To calculate the 95th percentile of request durations over the last 5 minutes:

histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))

This is especially useful for tracking performance SLOs by providing a detailed view of latency.

3. Smoothing Out Spikes with avg_over_time()

To smooth out short-term fluctuations by averaging the request rate over a longer period:

avg_over_time(rate(http_requests_total[5m])[1h:])

This gives the average rate over the last hour, updated every 5 minutes, helping to track longer-term trends.

4. Comparing Rates Across Different Time Ranges

To detect sudden traffic spikes by comparing the short-term and long-term request rates:

rate(http_requests_total[5m]) / rate(http_requests_total[1h]) > 1.2

This query identifies when the short-term rate is 20% higher than the long-term rate.

5. Calculating the Rate of Change for a Gauge Metric

For calculating the rate of change in gauge metrics (like memory usage) using deriv():

deriv(process_resident_memory_bytes{job="app"}[1h])

This tracks how quickly memory usage is changing over an hour.

Optimizing Prometheus Remote Write Performance: Guide | Last9
Master Prometheus remote write optimization. Learn queue tuning, cardinality management, and relabeling strategies to scale your monitoring infrastructure efficiently.

How rate() Extrapolates Data

An important but often overlooked aspect of rate() is how it handles extrapolation. The function doesn't simply divide the total increase by the time range. Instead, it:

  1. Calculates the rate between the first and last data points in the range
  2. Extrapolates this rate to a per-second value
  3. Accounts for counter resets within the range

This extrapolation behavior means that if your traffic pattern changes dramatically just after the start of your time range, rate() will still consider the older data points in its calculation, potentially masking recent changes.

Understanding this behavior is crucial when interpreting graphs and setting appropriate time ranges for your specific monitoring needs.

Advanced Applications of rate()

Beyond basic monitoring, the rate() function enables sophisticated observability patterns that can transform your Prometheus metrics into actionable insights.

Common Advanced Patterns

Pattern Example Use Case
Multi-instance Aggregation sum(rate(http_requests_total{job="api"}[5m])) Measure total request rate across all API servers
Error Percentage Calculation sum(rate(http_requests_total{status="error"}[5m])) / sum(rate(http_requests_total[5m])) * 100 Track error rates as percentage of total traffic
Top Consumer Identification topk(5, sum by(user) (rate(api_requests_total[10m]))) Find users generating the most traffic
Historical Comparison rate(http_requests_total[5m]) / rate(http_requests_total[5m] offset 1d) Compare current traffic to same time yesterday

These patterns form the building blocks for comprehensive monitoring systems that can detect anomalies, track performance regressions, and identify optimization opportunities in your infrastructure and applications.

Getting Started with Prometheus Metrics Endpoints | Last9
Learn how to get started with Prometheus metrics endpoints to collect, expose, and query critical data for better system monitoring.

5 Common Pitfalls and How to Avoid Them

When working with the rate() function, keep an eye on these common issues:

  1. Too Short Time Ranges: Using a time range that's shorter than the scrape interval can result in inaccurate data. A good rule of thumb is to use a time range at least 4x the scrape interval. For instance, if your scrape interval is 15 seconds, set the time range in your rate() function to at least 1 minute.
  2. Ignoring Counter Resets: The rate() function can handle counter resets (e.g., when a service restarts), but these resets can still cause temporary spikes. Be mindful of this when interpreting data, as the rate is calculated from the available information.
  3. Misunderstanding Aggregation: Since rate() returns a per-second value, summing or averaging these rates won’t give an accurate total. Instead, sum the underlying counters and then apply the rate() function.
  4. Inappropriate Use with Gauges: The rate() function is specifically for counters. Using it with gauge metrics will yield incorrect results, so avoid this combination.
  5. Neglecting Label Changes: Frequent changes in label values can create gaps in your data, leading to inaccurate rate calculations. Always account for potential label changes when working with metrics.

Fix observability issues with Prometheus instantly — right from your IDE, with AI and Last9 MCP.

Last 9 Mobile Illustration

Practical Example: Monitoring API Request Rates

Let's consider a scenario where you need to monitor request rates for different API endpoints. Here's how it can be set up:

  1. Instrument the API: Expose a counter metric, api_requests_total, with labels for the endpoint and method.
  2. Create a Grafana Dashboard: Use the following PromQL query to visualize the request rates:
sum by (endpoint) (rate(api_requests_total{job="api-server"}[5m]))

This query calculates the request rate for each endpoint over the last 5 minutes and sums up the rates for all methods.

  1. Set Up Alerts: Trigger an alert if the request rate for any endpoint exceeds 100 requests per second:
sum by (endpoint) (rate(api_requests_total{job="api-server"}[5m])) > 100

This setup provides a real-time view of API usage patterns, enabling quick identification of bottlenecks and helping optimize heavily used endpoints.

Best Practices for Using the rate() Function

To make the most of the rate() function in Prometheus, keep these best practices in mind:

  • Choose the Right Time Range: Ensure the time range is long enough to capture meaningful trends while remaining short enough to respond to real-time changes.
  • Alerting with Care: When using rate() in alerts, opt for longer time ranges to reduce the risk of false positives caused by short-term fluctuations.
  • Use Other Functions: As highlighted in advanced examples, combining rate() with other PromQL functions can provide richer insights into your data.
  • Know Your Metrics: Understand the nature of your data—such as how often metrics update and their variability—to ensure accurate monitoring.
  • Test Thoroughly: Validate your queries with historical data to ensure they're reliable under different conditions and scenarios.

Best Practices for Selecting Time Ranges

Choosing the right time range for rate() involves balancing between smoothing and responsiveness:

  • For high-frequency metrics (scraped every 15s), use 1-2 minutes for alerting and 5-10 minutes for dashboards
  • For standard metrics (scraped every minute), use 5 minutes for alerting and 10-30 minutes for dashboards
  • For slow-changing metrics, longer ranges (up to several hours) may be appropriate

Tips for selecting the optimal range:

  • Start with 4x your scrape interval as a minimum
  • Increase the range if graphs appear too noisy
  • Decrease the range if you need to detect changes more quickly
  • Consider using different ranges for alerting vs. dashboarding
  • Test different ranges under various load conditions to find what works best

Remember that longer ranges provide more stability but mask short-term changes, while shorter ranges show more detail but may introduce noise.

How to Use Prometheus for APM | Last9
Learn how to turn Prometheus into a powerful APM tool—track app performance, reduce guesswork, and get real visibility into your systems.

Conclusion

The Prometheus rate() function is a fundamental tool for monitoring, offering versatility from tracking request rates to analyzing performance and error metrics. Its strength lies in revealing the rate of change across various metrics, making it vital for any observability strategy.

💡
And if you’d like to discuss anything further, our Discord community is open. We have a dedicated channel where you can share your use case and connect with other developers.

FAQs

Q: How does the Prometheus rate() function differ from increase()?
A: While rate() calculates the per-second average rate of increase, increase() calculates the total increase in the counter's value over the time range. rate() is generally more useful for ongoing monitoring, while increase() can help understand total change over a specific period.

Q: How do you calculate request rates using the Prometheus rate function?
A: To calculate request rates, use a query like rate(http_requests_total[5m]). This will give the per-second rate of requests over the last 5 minutes. These rates can be summed or grouped as needed, e.g., sum(rate(http_requests_total[5m])) for the total request rate across all instances.

Q: Are there approaches for capturing spikes with PromQL?
A: Yes, max_over_time() can be used with rate() to capture spikes. For example, max_over_time(rate(http_requests_total[5m])[1h:]) will show the maximum rate observed in 5-minute windows over the last hour.

Q: How do you calculate the increase of a counter over time using Prometheus functions?
A: To calculate the total increase of a counter, use the increase() function. For example, increase(http_requests_total[1h]) will show the total increase in the number of requests over the last hour.

Q: Can rate() be used with all types of Prometheus metrics?
A: No, rate() should only be used with counter-metrics. It doesn't make sense to use rate() with gauge metrics, as they don't represent cumulative values.

Contents


Newsletter

Stay updated on the latest from Last9.

Authors
Anjali Udasi

Anjali Udasi

Helping to make the tech a little less intimidating. I love breaking down complex concepts into easy-to-understand terms.