Why Your PromQL Availability Query Returns Nothing When Services Are Healthy

Your availability dashboard looks great under load. The moment your services run clean — zero errors, everything healthy — the metrics disappear. Gaps in the chart. "No Data" where you expect 100%.

This isn't a configuration problem. It's how PromQL works, and once you understand it, the fix is three lines.

The Setup

Say you're tracking availability for a set of microservices using trace-derived metrics from OpenTelemetry. A standard SLI approach computes availability as:

Availability = (1 - error_rate) × 100

Where error rate is the ratio of 5xx responses to total requests. A typical first attempt:

(1 - (
  sum by (service_name) (
    trace_endpoint_count{
      service_name=~"auth-service|billing-service|api-gateway|user-service",
      env="prod",
      span_kind="SPAN_KIND_SERVER",
      http_status_code!="",
      http_status_code=~"5.*"
    }
  )
  /
  (
    sum by (service_name) (
      trace_endpoint_count{
        service_name=~"auth-service|billing-service|api-gateway|user-service",
        env="prod",
        span_kind="SPAN_KIND_SERVER"
      }
    ) + 0.0000001
  )
)) * 100

The + 0.0000001 in the denominator avoids division by zero. Looks reasonable.

The Problem

This query works when services are throwing 5xx errors. The moment a service has zero 5xx responses — the good scenario — the numerator returns no data. Not zero. Not 0. An empty instant vector.

In PromQL, when no time series matches a selector, the result is an empty set. Dividing an empty vector by anything produces another empty vector. The entire expression for that service evaluates to nothing, and your dashboard shows a gap.

This is especially painful on SLI dashboards where a missing data point triggers alerts or unsettles stakeholders — precisely when the service is healthiest.

Why PromQL Works This Way

PromQL is a set-based language. Every selector returns a set of time series. Arithmetic operators work on matching series across sets. If a series doesn't exist in one operand, there's nothing to match, so no result is produced.

This differs from SQL, where COUNT(*) on an empty result set returns 0. In PromQL, no matching series means no output. See the PromQL cheat sheet for a full reference on how vector matching works.

**The Fix: The `* 0` Fallback Pattern**

Ensure the 5xx selector always returns a series — even when there are no 5xx errors — by using or with a zero-valued version of a series you know exists (total requests):

(1 - (
  sum by (service_name) (
    trace_endpoint_count{
      service_name=~"auth-service|billing-service|api-gateway|user-service",
      env="prod",
      span_kind="SPAN_KIND_SERVER",
      http_status_code!="",
      http_status_code=~"5.*"
    }
    or
    trace_endpoint_count{
      service_name=~"auth-service|billing-service|api-gateway|user-service",
      env="prod",
      span_kind="SPAN_KIND_SERVER"
    } * 0
  )
  /
  (
    sum by (service_name) (
      trace_endpoint_count{
        service_name=~"auth-service|billing-service|api-gateway|user-service",
        env="prod",
        span_kind="SPAN_KIND_SERVER"
      }
    ) + 0.0000001
  )
)) * 100

How it works

The key addition is:

or
trace_endpoint_count{...all services, env="prod", span_kind="SPAN_KIND_SERVER"} * 0

Step by step:

5xx errors exist: The first selector returns matching series. or sees series already exist for those labels and ignores the fallback. Normal path.
No 5xx errors: The first selector returns nothing. or kicks in and provides the fallback — total request count multiplied by zero. Produces a time series with the correct service_name label and a value of 0.
sum by (service_name) collapses correctly, giving 0 for the error count.
Final result: (1 - 0/total) * 100 = 100% — exactly right for a healthy service.

Why not `or vector(0)`?

vector(0) produces a scalar with no labels. When the denominator is grouped by (service_name), PromQL can't match the label-less 0 against labeled denominator series. You get a many-to-one matching error or wrong results.

The * 0 pattern preserves the original label set. The fallback series carries the same service_name, env, and other labels as the real data, so all grouping and matching works correctly.

Other Approaches

clamp_min(..., 0) — Sets a floor value on existing series. Doesn't help when the series doesn't exist at all.

Recording rules — Pre-compute error count with a recording rule that handles the zero case. Works but adds operational overhead and another artifact to maintain.

absent() function — Returns a series with value 1 when the input is empty. You could construct (your_query or absent(your_query) * 0), but absent() doesn't preserve labels well across grouped queries.

The * 0 pattern is the simplest approach that correctly handles labels without extra infrastructure.

The General Pattern

Any time you compute a ratio in PromQL where the numerator might legitimately be empty:

sum by (label) (
  metric_with_specific_filter{...}
  or
  metric_with_broader_filter{...} * 0
)
/
sum by (label) (
  metric_with_broader_filter{...}
)

The broader filter should match a superset of what the specific filter matches — same labels, without the restrictive condition. * 0 gives you the right labels with a zero value. or only uses the fallback when the primary selector is empty.

Build SLIs That Hold Up in Production

Availability queries are the foundation of any practical SLO implementation. The * 0 pattern is one of several query-correctness issues that only surface in production — when traffic patterns hit edge cases your staging environment never saw.

If you're using OpenTelemetry trace-derived metrics, Last9 stores trace_endpoint_count natively from your OTLP pipeline — the query above works out of the box without recording rules or custom aggregations.

Get started with Last9 or check the OpenTelemetry integration docs.

Why Your PromQL Availability Query Returns Nothing When Services Are Healthy

Contents

The Setup

The Problem

Why PromQL Works This Way

**The Fix: The `* 0` Fallback Pattern**

How it works

Why not `or vector(0)`?

Other Approaches

The General Pattern

Build SLIs That Hold Up in Production

Contents

Start observing for free. No lock-in.

Why Your PromQL Availability Query Returns Nothing When Services Are Healthy

Contents

The Setup

The Problem

Why PromQL Works This Way

The Fix: The * 0 Fallback Pattern

How it works

Why not or vector(0)?

Other Approaches

The General Pattern

Build SLIs That Hold Up in Production

Contents

Start observing for free. No lock-in.

**The Fix: The `* 0` Fallback Pattern**

Why not `or vector(0)`?