Apr 23rd, 2026

Why Your PromQL Availability Query Returns Nothing When Services Are Healthy

Your SLI query shows 100% availability as No Data. Here's why PromQL returns empty results instead of zero — and the label-preserving fix.

Why Your PromQL Availability Query Returns Nothing When Services Are Healthy

Contents

Your availability dashboard looks great under load. The moment your services run clean — zero errors, everything healthy — the metrics disappear. Gaps in the chart. "No Data" where you expect 100%.

This isn't a configuration problem. It's how PromQL works, and once you understand it, the fix is three lines.

The Setup

Say you're tracking availability for a set of microservices using trace-derived metrics from OpenTelemetry. A standard SLI approach computes availability as:

Availability = (1 - error_rate) × 100

Where error rate is the ratio of 5xx responses to total requests. A typical first attempt:

(1 - (
  sum by (service_name) (
    trace_endpoint_count{
      service_name=~"auth-service|billing-service|api-gateway|user-service",
      env="prod",
      span_kind="SPAN_KIND_SERVER",
      http_status_code!="",
      http_status_code=~"5.*"
    }
  )
  /
  (
    sum by (service_name) (
      trace_endpoint_count{
        service_name=~"auth-service|billing-service|api-gateway|user-service",
        env="prod",
        span_kind="SPAN_KIND_SERVER"
      }
    ) + 0.0000001
  )
)) * 100

The + 0.0000001 in the denominator avoids division by zero. Looks reasonable.

The Problem

This query works when services are throwing 5xx errors. The moment a service has zero 5xx responses — the good scenario — the numerator returns no data. Not zero. Not 0. An empty instant vector.

In PromQL, when no time series matches a selector, the result is an empty set. Dividing an empty vector by anything produces another empty vector. The entire expression for that service evaluates to nothing, and your dashboard shows a gap.

This is especially painful on SLI dashboards where a missing data point triggers alerts or unsettles stakeholders — precisely when the service is healthiest.

Why PromQL Works This Way

PromQL is a set-based language. Every selector returns a set of time series. Arithmetic operators work on matching series across sets. If a series doesn't exist in one operand, there's nothing to match, so no result is produced.

This differs from SQL, where COUNT(*) on an empty result set returns 0. In PromQL, no matching series means no output. See the PromQL cheat sheet for a full reference on how vector matching works.

The Fix: The * 0 Fallback Pattern

Ensure the 5xx selector always returns a series — even when there are no 5xx errors — by using or with a zero-valued version of a series you know exists (total requests):

(1 - (
  sum by (service_name) (
    trace_endpoint_count{
      service_name=~"auth-service|billing-service|api-gateway|user-service",
      env="prod",
      span_kind="SPAN_KIND_SERVER",
      http_status_code!="",
      http_status_code=~"5.*"
    }
    or
    trace_endpoint_count{
      service_name=~"auth-service|billing-service|api-gateway|user-service",
      env="prod",
      span_kind="SPAN_KIND_SERVER"
    } * 0
  )
  /
  (
    sum by (service_name) (
      trace_endpoint_count{
        service_name=~"auth-service|billing-service|api-gateway|user-service",
        env="prod",
        span_kind="SPAN_KIND_SERVER"
      }
    ) + 0.0000001
  )
)) * 100

How it works

The key addition is:

or
trace_endpoint_count{...all services, env="prod", span_kind="SPAN_KIND_SERVER"} * 0

Step by step:

  1. 5xx errors exist: The first selector returns matching series. or sees series already exist for those labels and ignores the fallback. Normal path.
  2. No 5xx errors: The first selector returns nothing. or kicks in and provides the fallback — total request count multiplied by zero. Produces a time series with the correct service_name label and a value of 0.
  3. sum by (service_name) collapses correctly, giving 0 for the error count.
  4. Final result: (1 - 0/total) * 100 = 100% — exactly right for a healthy service.

Why not or vector(0)?

vector(0) produces a scalar with no labels. When the denominator is grouped by (service_name), PromQL can't match the label-less 0 against labeled denominator series. You get a many-to-one matching error or wrong results.

The * 0 pattern preserves the original label set. The fallback series carries the same service_name, env, and other labels as the real data, so all grouping and matching works correctly.

Other Approaches

clamp_min(..., 0) — Sets a floor value on existing series. Doesn't help when the series doesn't exist at all.

Recording rules — Pre-compute error count with a recording rule that handles the zero case. Works but adds operational overhead and another artifact to maintain.

absent() function — Returns a series with value 1 when the input is empty. You could construct (your_query or absent(your_query) * 0), but absent() doesn't preserve labels well across grouped queries.

The * 0 pattern is the simplest approach that correctly handles labels without extra infrastructure.

The General Pattern

Any time you compute a ratio in PromQL where the numerator might legitimately be empty:

sum by (label) (
  metric_with_specific_filter{...}
  or
  metric_with_broader_filter{...} * 0
)
/
sum by (label) (
  metric_with_broader_filter{...}
)

The broader filter should match a superset of what the specific filter matches — same labels, without the restrictive condition. * 0 gives you the right labels with a zero value. or only uses the fallback when the primary selector is empty.

Build SLIs That Hold Up in Production

Availability queries are the foundation of any practical SLO implementation. The * 0 pattern is one of several query-correctness issues that only surface in production — when traffic patterns hit edge cases your staging environment never saw.

If you're using OpenTelemetry trace-derived metrics, Last9 stores trace_endpoint_count natively from your OTLP pipeline — the query above works out of the box without recording rules or custom aggregations.

Get started with Last9 or check the OpenTelemetry integration docs.

About the authors
Prathamesh Sonpatki

Prathamesh Sonpatki

Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Last9 keyboard illustration

Start observing for free. No lock-in.

OPENTELEMETRY • PROMETHEUS

Just update your config. Start seeing data on Last9 in seconds.

DATADOG • NEW RELIC • OTHERS

We've got you covered. Bring over your dashboards & alerts in one click.

BUILT ON OPEN STANDARDS

100+ integrations. OTel native, works with your existing stack.

Gartner Cool Vendor 2025 Gartner Cool Vendor 2025
High Performer High Performer
Best Usability Best Usability
Highest User Adoption Highest User Adoption