Essential Prometheus Queries: Simple to Advanced

This guide covers practical Prometheus query examples from basic metric selection to advanced alerting and anomaly detection. Each example shows the query, what it returns, and how to adapt it for your own metrics.

What Makes Prometheus Queries So Valuable?

PromQL is Prometheus's query language. It lets you filter, aggregate, and compute over time series data to answer operational questions, build dashboards, and define alert conditions.

Whether you're tracking CPU usage across a Kubernetes cluster or monitoring API latency, PromQL gives you precise control over what you measure and how you alert on it.

💡

If you want to get more out of Prometheus, here’s a handy guide on Prometheus functions to help with your queries.

Getting Started: Basic Prometheus Query Examples

Here are foundational queries that every monitoring stack should include:

Simple Metric Selection

http_requests_total

This query returns all time series with the metric name http_requests_total. It's as straightforward as it gets—simply naming the metric you want to see. You'll get back every label combination Prometheus has stored for this metric.

Filtering with Labels

http_requests_total{status="500"}

This query filters the http_requests_total metric to only show requests that resulted in a 500 error code. The curly braces let you filter by any label attached to your metrics, making it easy to zero in on exactly what you need.

Rate Function for Counter Metrics

rate(http_requests_total{job="api-server"}[5m])

This query calculates the per-second rate of HTTP requests to your API server over the last 5 minutes. The rate() function is your go-to for counter metrics that only increase over time. It helps you see velocity rather than just cumulative totals.

💡

If you want to do more with Prometheus, check out our guide on the Prometheus API and how it can help with your queries.

Intermediate Techniques to Level Up Your Monitoring

These intermediate examples will help you build more sophisticated dashboards:

Aggregating Metrics by Label

sum(rate(http_requests_total[5m])) by (status)

This query takes the rate of all HTTP requests over 5 minutes and adds them together, but keeps them separated by status code. The result? A clean view of your error rates compared to successful requests—perfect for spotting when things start to go sideways.

Finding Top Consumers

topk(5, sum(rate(node_cpu_seconds_total{mode!="idle"}[5m])) by (instance))

This query shows the 5 instances with the highest CPU usage. The topk() function ranks your results, making it easy to identify resource hogs at a glance.

Calculating Service Uptime

(sum(up{job="api-server"}) / count(up{job="api-server"})) * 100

This query gives you a clean percentage of your API servers that are currently up. It divides the number of "up" instances by the total count and multiplies by 100. Simple, but incredibly useful for SLO tracking.

Advanced PromQL: Become a Query Expert

These advanced queries cover capacity planning, error budget tracking, and anomaly detection:

Predicting Future Values

predict_linear(node_filesystem_free_bytes{mountpoint="/"}[6h], 4 * 3600)

This query predicts how much disk space you'll have in 4 hours based on the usage pattern over the last 6 hours. The predict_linear() function is your crystal ball for capacity planning—catch problems before they happen.

You can extend this to create early warning systems for disk capacity issues:

predict_linear(node_filesystem_free_bytes{mountpoint="/"}[6h], 24 * 3600) < 10 * 1024 * 1024 * 1024

This alerts when any filesystem is predicted to have less than 10GB free within the next 24 hours, giving you ample time to add capacity or clean up before users notice any problems.

💡

If you're setting up Prometheus, check out our guide on Prometheus port configuration to get it right.

Complex Alerting Conditions

sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.01

This query calculates your error budget by dividing the rate of 5xx errors by the total request rate. If more than 1% of requests are failing, this expression will evaluate to true—perfect for triggering alerts when things go south.

For multi-window analysis to prevent alert flapping, use:

(
  sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
  and
  sum(rate(http_requests_total{status=~"5.."}[1h])) / sum(rate(http_requests_total[1h]))
) > 0.01

This only triggers when both the 5-minute and 1-hour error rates exceed 1%, reducing false alarms from brief spikes while still catching persistent issues.

Histogram Quantiles for Latency Monitoring

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

Want to know what your 95th percentile latency looks like? This query takes your request duration histogram and calculates exactly that. The histogram_quantile() function is essential for performance monitoring and SLO adherence.

For more detailed analysis, compare multiple percentiles simultaneously:

{
  p50="histogram_quantile(0.5, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))",
  p90="histogram_quantile(0.9, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))",
  p95="histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))",
  p99="histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))"
}

This creates a complete latency profile that helps you distinguish between general slowness and outlier requests affecting a small percentage of users.

Rate of Change Detection

deriv(process_resident_memory_bytes{job="api-server"}[30m]) > 1024 * 1024

This detects memory leaks by alerting when the memory usage is growing faster than 1MB per second over a 30-minute window—catching gradual resource exhaustion before it becomes critical.

Dynamic Baseline Comparison

sum(rate(http_requests_total[5m])) 
  < 
avg_over_time(sum(rate(http_requests_total[5m]))[7d:1h] offset 1d) * 0.7

This detects traffic drops by comparing current request rates against the same time period from previous days. It triggers when traffic falls below 70% of the typical pattern, which could indicate routing issues or upstream service failures.

💡

If you’re working with Prometheus, understanding metric types is key. Here’s a guide on Prometheus metric types to help you out.

Practical Prometheus Query Scenarios

Here are queries for common production scenarios:

Monitoring Kubernetes Pod Resource Usage

sum(rate(container_cpu_usage_seconds_total{pod=~"api-.*"}[5m])) by (pod)

This query shows CPU usage rates for all API pods in your Kubernetes cluster. It uses regex matching (=~) to select pods whose names start with "api-", then groups the results by pod name. This helps you spot which specific pods might need more resources or might be experiencing unusual load patterns.

For more comprehensive Kubernetes monitoring, combine this with memory usage tracking:

sum(container_memory_working_set_bytes{pod=~"api-.*"}) by (pod) / (1024 * 1024)

This gives you memory usage in MB per pod, making it easy to identify memory leaks or pods approaching their limits. Pairing CPU and memory metrics gives you the full resource utilization picture.

Detecting Slow Database Queries

max_over_time(mysql_global_status_slow_queries[1h]) - min_over_time(mysql_global_status_slow_queries[1h])

This query shows how many new slow queries have been logged in the past hour. By subtracting the minimum counter value from the maximum in a time window, you can see the increment even for constantly increasing counters.

You can extend this to monitor database connections and identify potential connection pool issues:

mysql_global_status_threads_connected / mysql_global_variables_max_connections * 100

This percentage tells you how close you are to maxing out your database connections – critical for avoiding application timeouts during traffic spikes.

Tracking API Errors by Endpoint

sum(rate(http_requests_total{status=~"5.."}[5m])) by (path) / sum(rate(http_requests_total[5m])) by (path) * 100

This query gives you the error percentage broken down by API endpoint. It divides the 5xx error rate by the total request rate for each path, multiplied by 100 to get a percentage. This helps you quickly pinpoint which specific endpoints are causing problems rather than just seeing an overall error rate spike.

Alerting on Service Level Objective (SLO) Breaches

sum(rate(http_request_duration_seconds_count{status!~"5.."}[5m])) by (service) / sum(rate(http_request_duration_seconds_count[5m])) by (service) < 0.995

This query alerts when your service availability drops below 99.5% (your SLO). It calculates the ratio of successful requests to total requests over a 5-minute window. Perfect for monitoring compliance with customer SLAs.

Network Traffic Anomaly Detection

abs(
  rate(node_network_transmit_bytes_total[5m])
  - avg_over_time(rate(node_network_transmit_bytes_total[5m])[1d:5m])
) / avg_over_time(rate(node_network_transmit_bytes_total[5m])[1d:5m]) > 0.3

This complex query detects when your current network traffic deviates more than 30% from the typical pattern over the past day. It's fantastic for catching unexpected data transfers that might indicate a security issue or misconfigured application.

💡

If you need to push metrics in Prometheus, check out our guide on the Prometheus Pushgateway and how it works.

How to Optimize Your Prometheus Queries

Even the most powerful query isn't helpful if it brings your monitoring system to its knees. Here are some tips for keeping things fast:

Use Time Ranges Wisely

# Less efficient for long periods
rate(http_requests_total[7d])

# More efficient approach
avg_over_time(rate(http_requests_total[5m])[7d:5m])

The second query is much more efficient because it asks Prometheus to calculate the 5-minute rates first, then average those pre-calculated rates over 7 days. This approach can reduce query execution time from minutes to seconds for long time ranges.

Limit Cardinality

# High cardinality - could be hundreds of thousands of series
http_requests_total{path="/api/v1/users/*/profile"}

# Lower cardinality - grouped by status code instead of individual paths
sum(http_requests_total) by (status, method)

The second query groups metrics by status code and method rather than tracking every unique URL path, dramatically reducing the number of time series from potentially millions to just dozens.

Pre-calculate Expensive Queries with Recording Rules

Instead of repeatedly running expensive queries in dashboards, create recording rules in your Prometheus configuration:

groups:
  - name: api_slos
    interval: 30s
    rules:
      - record: job:http_requests_total:rate5m
        expr: sum(rate(http_requests_total[5m])) by (job)

Then your dashboard can use the pre-calculated metric:

job:http_requests_total:rate5m{job="api-server"}

This approach can reduce dashboard load times by orders of magnitude and prevent your Prometheus server from becoming overwhelmed during peak usage.

💡

If you're running Prometheus at scale, here are some tips and strategies to keep it efficient.

Favor Subqueries Over Long Range Vectors

# Potentially expensive for high-cardinality metrics
max_over_time(http_requests_total{job="api"}[7d])

# More efficient approach
max_over_time(http_requests_total{job="api"}[1h]) offset $__interval * 0 or
max_over_time(http_requests_total{job="api"}[1h]) offset $__interval * 1 or
...
max_over_time(http_requests_total{job="api"}[1h]) offset $__interval * 167

Breaking down long-range queries into smaller chunks with subqueries or multiple OR conditions can dramatically improve performance by allowing Prometheus to parallelize processing.

Function Type	Common Use Cases	Performance Impact	Optimization Strategy
Aggregations (sum, avg, max)	Dashboard overviews, alerting	Low to medium	Use when filtering high-cardinality dimensions
Range vectors [1h]	Rate calculations, trends	Medium to high	Keep timespan as short as practical
Join operations	Cross-metric correlations	High	Pre-compute with recording rules
Regular expressions	Dynamic filtering	Very high	Replace with explicit label matching when possible
Subqueries	Long-term trends, forecasting	Very high	Use recording rules or federated metrics

Debugging Common PromQL Issues

Here are solutions to common PromQL issues:

No Data Points Issue

# Might return no data points
rate(some_counter[5m])

# More resilient approach
rate(some_counter[5m] offset 5m)

Adding an offset can help you see data that was collected even if there's been a recent gap in metrics collection. It's a great way to diagnose whether something stopped reporting or truly dropped to zero.

For alerting scenarios, use the absent() function to detect missing metrics:

absent(up{job="api-server"})

This returns a 1 if the metric is missing entirely, making it perfect for alerting when a service stops reporting metrics completely—often a sign of more serious problems than just high error rates.

Counter Resets

# Vulnerable to counter resets
increase(http_requests_total[1h])

# Handles counter resets better
rate(http_requests_total[1h]) * 3600

The rate() function intelligently handles counter resets, making it more reliable than a simple increase() for longer periods. This is crucial for containers and pods that frequently restart in orchestrated environments.

💡

If you want to manage alerts better, check out our guide on Prometheus Alertmanager and how it helps.

Dealing with Gaps in Time Series

# Might have gaps when service restarts
sum(rate(http_requests_total[5m])) by (service)

# Fills gaps with last known value for up to 5m
sum(rate(http_requests_total[5m])) by (service) or vector(0)

The "or vector(0)" approach ensures your graphs don't show gaps during brief service restarts or metric collection issues, providing visual continuity for easier pattern recognition.

Fixing "No Data Points" in Rate Calculations

# Might fail if the time range isn't long enough
rate(http_requests_total[1m])

# More reliable with shorter scrape intervals
rate(http_requests_total[5m])

Always make sure your rate() time range includes at least two scrape intervals. If your Prometheus scrapes every 15s, a 1m range should be sufficient, but a 5m range provides better reliability, especially during high-load periods when scrapes might be delayed.

Debug Metric Existence and Dimensions

count({__name__=~"node_.*"}) by (__name__)

This meta-query helps you discover what metrics are available in your Prometheus instance. It's incredibly useful when working with a new exporter or trying to find the exact name of a metric you need.

count(node_cpu_seconds_total) by (mode, cpu)

This tells you what label combinations exist for a specific metric, helping you understand the dimensions available for filtering or aggregation.

Frequently Asked Questions

What is PromQL in Prometheus?

PromQL (Prometheus Query Language) is the query language built into Prometheus for selecting and aggregating time series data. It supports label filtering, mathematical operations, rate calculations, and aggregations like sum, avg, and topk. PromQL queries run against the Prometheus data store and return either instant vectors (a single value per time series) or range vectors (a set of values over a time window).

How do I calculate the rate of a counter metric in Prometheus?

Use the rate() function with a time range: rate(metric_name[5m]). This returns the per-second average rate of increase over the last 5 minutes. Always use rate() (not increase()) for alerting — it handles counter resets correctly when a process restarts.

What is the difference between rate() and irate() in Prometheus?

rate() calculates the average per-second rate over the full time range you specify. irate() calculates the per-second rate using only the last two data points, making it more responsive to sudden spikes but noisier for dashboards. Use rate() for alerting and dashboards; use irate() only when you need to detect instantaneous spikes.

How do I filter Prometheus metrics by label?

Add label matchers inside curly braces: http_requests_total{status="500", job="api"}. Prometheus supports four matchers: = (exact match), != (not equal), =~ (regex match), and !~ (regex not match). For example, http_requests_total{status=~"5.."} matches all 5xx status codes.

How do I aggregate Prometheus metrics across multiple instances?

Use an aggregation operator with a by clause: sum(rate(http_requests_total[5m])) by (job). This sums the request rate across all instances, grouped by job label. Common aggregation operators are sum, avg, max, min, count, and topk.

How do I calculate latency percentiles in Prometheus?

Use histogram_quantile() with a histogram metric: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)). This gives you the 95th percentile latency. Your application must expose metrics in histogram format (with _bucket, _sum, and _count suffixes) for this to work.

How do I detect when a service goes down in Prometheus?

Use the up metric: up{job="your-service"} == 0. Prometheus sets up to 1 when a scrape succeeds and 0 when it fails. For services that stop reporting entirely, use absent(up{job="your-service"}), which returns a result only when the metric is missing from Prometheus altogether.

When should I use recording rules in Prometheus?

Use recording rules when a query is expensive to compute and used repeatedly in dashboards or alerts. Recording rules pre-compute the result on a schedule and store it as a new metric, so dashboards load the pre-computed value instead of re-running the full query. They are especially useful for high-cardinality aggregations and queries that span long time ranges.

Conclusion

The queries above cover the most common production patterns. Start with basic selectors and rate(), then layer in aggregations, histogram_quantile(), and recording rules as your system grows. For capacity planning, predict_linear() and dynamic baseline queries give you early warning before users notice problems.

Last9 is a managed observability platform compatible with Prometheus and OpenTelemetry. Run PromQL against your metrics without operating Prometheus storage yourself. Try Last9 free.