Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

Mar 27th, ‘25 / 5 min read

21 PromQL Tricks Every Developer Should Know

Boost your PromQL skills with these 21 handy tricks—optimize queries, troubleshoot faster, and get deeper insights from your metrics.

21 PromQL Tricks Every Developer Should Know

So you've got Prometheus up and running, but now you're scratching your head looking at those queries.

PromQL (Prometheus Query Language) looks simple on the surface, but it packs some serious power once you know how to wield it.

Whether you're debugging production issues at 2 AM or building dashboards that actually tell you something useful, these PromQL tricks will upgrade your monitoring game.

What Makes PromQL Different?

PromQL isn't just another query language – it's built specifically for time series data, making it uniquely suited for monitoring metrics. Unlike SQL, PromQL thinks in vectors and instants, not rows and tables.

When you run a PromQL query, you're usually getting back one of these:

  • An instant vector (a set of time series, each with a single sample at the same timestamp)
  • A range vector (a set of time series with a range of samples over time)
  • A scalar (a simple numeric value)
  • A string (rarely used, but it's there)

Now let's get into the good stuff.

💡
If you're looking for a quick reference while working with PromQL, check out this PromQL cheat sheet for useful queries and tips.

Trick 1: Master the Rate Function

The rate() function is your bread and butter for counter metrics. It calculates how fast a counter is increasing per second.

rate(http_requests_total[5m])

This gives you the per-second rate of HTTP requests over the last 5 minutes. But here's the clever part – rate() handles counter resets gracefully. If your application restarts and the counter goes back to zero, rate() still gives you accurate numbers.

Pro tip: Pair rate() with a longer timeframe for stable metrics, and shorter timeframes when you need to spot quick changes.

Trick 2: Use Increase() for Cleaner Numbers

Want to know how many requests you've received in the last hour without doing mental math? That's what increase() is for:

increase(http_requests_total[1h])

This gives you the total increase in the counter over the specified time – much easier to reason about than per-second rates in some cases.

Trick 3: Turn Gauges into Rates When Needed

While you can't use rate() directly on gauges, you can track how gauges change over time:

deriv(process_resident_memory_bytes[1h])

This shows you how your memory usage is trending – useful for catching slow memory leaks.

Trick 4: Label Filtering Shortcuts

Filter metrics like a boss with these shorthand tricks:

# Select only production environments
http_requests_total{env="production"}

# Select everything except production
http_requests_total{env!="production"}

# Regex match: all environments starting with "prod"
http_requests_total{env=~"prod.*"}

# Regex exclude: no testing environments
http_requests_total{env!~"test.*"}

Trick 5: The Power of By and Without

Group metrics and clean up your results with by and without:

# Group request count by endpoint, dropping other labels
sum by(endpoint) (http_requests_total)

# Sum requests but remove the method label
sum without(method) (http_requests_total)

This keeps your graphs clean and your dashboards meaningful.

💡
If you want a deeper understanding of PromQL, check out this guide to Prometheus Query Language for a solid foundation.

Trick 6: Offset for Better Comparisons

Want to compare metrics to last week? Use offset:

# Current request rate
rate(http_requests_total[5m])

# Request rate one week ago
rate(http_requests_total[5m] offset 1w)

You can even calculate the difference directly:

rate(http_requests_total[5m]) - 
rate(http_requests_total[5m] offset 1w)

Trick 7: Use delta() for Gauge Changes

For gauge metrics, delta() shows you exactly how much the value changed over a period:

# How much did CPU temp change in the last 30m?
delta(cpu_temp_celsius[30m])

This works great for metrics that both increase and decrease.

Trick 8: Alerting on Absent Metrics

What if your metric just... disappears? That's often worse than a bad value. Catch it with:

absent(up{job="api-server"})

This returns 1 if the metric is missing, making it perfect for alerting.

Trick 9: Convert Between Time Units

Need to see results in minutes rather than seconds? Just multiply:

# Request rate per minute instead of per second
rate(http_requests_total[5m]) * 60

Or for hours:

rate(http_requests_total[5m]) * 3600

Trick 10: Calculate Percentiles the Right Way

Don't calculate percentiles from already-aggregated data. Use Prometheus's built-in histogram quantiles:

histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

This gives you accurate 95th percentile latency from histogram metrics.

Trick 11: Binary Operators for Complex Comparisons

Mix and match metrics with binary operators:

# Find when error rates exceed 5% of total requests
rate(http_requests_error_total[5m]) > 0.05 * rate(http_requests_total[5m])

Trick 12: Use Subqueries for Moving Averages

Smooth out noisy metrics with a moving average:

avg_over_time(rate(http_requests_total[5m])[1h:5m])

This gives you the average rate calculated over a sliding 1-hour window, sampled every 5 minutes.

💡
If you're setting up alerts in Prometheus, check out this guide on Prometheus Alertmanager to manage notifications effectively.

Trick 13: The Unless Operator

The unless operator is your friend for filtering out expected cases:

# Find instances that are down, unless they're in maintenance mode
up == 0 unless maintenance == 1

Trick 14: Time() for Dynamic Thresholds

Use the time() function for dynamic, time-based checks:

# Different disk space alerts during business hours vs. overnight
disk_used_percent > 80 and (hour() >= 9 and hour() < 17)

Trick 15: Create On-the-Fly Metrics with Vector Matching

Need a custom metric that doesn't exist? Create it by matching two metrics:

# Calculate error percentage on the fly
rate(http_requests_error_total[5m]) / 
ignoring(status) 
rate(http_requests_total[5m]) * 100

The ignoring(status) part helps when labels don't perfectly match.

Trick 16: Sort and Limit for Top-N Queries

Focus on your biggest consumers with sorting:

# Top 5 memory-hungry pods
topk(5, container_memory_usage_bytes{namespace="production"})

Trick 17: Use predict_linear() for Trend Forecasting

Want to know if you'll run out of disk space in the next 24 hours?

# Predict disk free in 24 hours based on 6h of data
predict_linear(node_filesystem_free_bytes[6h], 86400) < 0

This returns 1 if you're projected to run out of space, making it perfect for alerting.

Trick 18: Dealing with Counter Resets Manually

Sometimes you need more control than rate() provides:

# Handle counter resets with explicit reset detection
changes(http_requests_total[1h]) > 1

This helps you identify when counters are being reset more often than expected.

💡
If you're running into scaling challenges with Prometheus, check out this guide on scaling Prometheus for practical tips and strategies.

Trick 19: Label_replace for Dynamic Relabeling

Transform your labels on the fly:

# Extract service name from a longer identifier
label_replace(metric_name, "service", "$1", "pod", "(.*)-[a-z0-9]+-[a-z0-9]+")

Trick 20: Use clamp_min() and clamp_max() for Cleaner Graphs

Outliers can make graphs unreadable. Tame them with:

# Cap CPU usage visualization at 100%
clamp_max(cpu_usage_percent, 100)

# Ensure values don't go below zero
clamp_min(temperature_celsius, 0)

Trick 21: Holt Winters for Smarter Predictions

For more accurate predictions that account for trends and seasonality:

holt_winters(rate(http_requests_total[1d])[30d:1d], 0.3, 0.3)

This gives you a weighted prediction that's more accurate than simple linear forecasting.

How to Put These PromQL Tricks to Work

The real power comes when you combine these techniques. For example:

Scenario PromQL Query
Alert on Error Spike rate(errors[5m]) > 3 * rate(errors[5m] offset 1h)
Track Weekly Patterns rate(requests[1h]) / rate(requests[1h] offset 7d)
Forecast Resource Needs predict_linear(cpu_usage[12h], 24 * 3600) / cpu_limit

These combinations help you build dashboards that tell stories, not just display numbers.

Metric Type Best PromQL Function
Counters rate(), increase()
Gauges avg_over_time(), delta()
Histograms histogram_quantile()

Wrapping Up

PromQL might seem strange at first if you're coming from SQL or other query languages, but its unique approach makes it incredibly powerful for time-series monitoring.

💡
If you've any any cool PromQL tricks you use? Drop them in our Discord community – we're always looking for new monitoring hacks!

Contents


Newsletter

Stay updated on the latest from Last9.

Authors
Preeti Dewani

Preeti Dewani

Technical Product Manager at Last9

X