Open source · MIT licensed

GPU at 80%.
But which pod?

DCGM tells you the device is busy. It doesn't tell you which Kubernetes pod, namespace, or Slurm job is responsible. That's the gap. l9gpu fills it.

View on GitHub Install in 60 seconds

Isometric illustration of an analog GPU meter pinned to the right, with a magnifying lens revealing one workload among many as the source of load

1 DaemonSet: per node, no sidecars
NVIDIA · AMD · Gaudi: vendor-neutral
Kubernetes + Slurm: both supported
OTLP out: any backend

The problem

You have Prometheus.
You have Grafana.
You have DCGM.
You still can't answer this question.

"Complete black box. Zero visibility into which pod is actually eating up the VRAM and compute utilization on those slices."

Platform engineer, multi-tenant A10 cluster

"All pods show identical values with GPU time-slicing. namespace/pod/container labels missing on MIG GPU."

DCGM-exporter GitHub, issues #577 and #582

"DCGM + Prometheus + Grafana — 4 moving parts solving what should be one question."

r/kubernetes, KEDA GPU Scaler thread

"We have observability for CPU and memory and APM for code — but nothing for the GPU and inferencing part."

r/devops, GPU observability thread

"We have A100s reserved through 2026 that barely hit 20% utilization. Finance treats them like insurance, not infrastructure."

r/kubernetes, 95% idle GPU thread

The attribution layer — the join between GPU hardware metrics and Kubernetes or Slurm workload identity — is what's missing. l9gpu is that layer.

How it works

One agent. Attribution at collection time.

l9gpu runs as a DaemonSet on every GPU node. It reads directly from NVML and DCGM, enriches each metric with the workload consuming that device, and ships OTLP to whatever backend you already have. No PromQL joins. No brittle label pipelines.

NVML / DCGM / amdsmi / hl-smi hardware source
l9gpu — attribution layer
1. node agent DaemonSet · OTLP source
3. k8sprocessor / slurmprocessor pod · namespace · job enrichment
OTLP export Prometheus · Grafana · any backend

metric labels before → after l9gpu

Before — DCGM raw

DCGM_FI_DEV_GPU_UTIL{
  gpu="0",
  device="nvidia0",
  modelName="A100"
} 83

After — l9gpu enriched

gpu_utilization{
  gpu="0",
  pod="inference-api-7f9d",
  namespace="production",
  deployment="inference-api",
  node="gpu-node-03",
  cluster="ml-cluster-us"
} 83

Platform support

Works wherever your GPUs are.

GPU Hardware

NVIDIA: NVML + DCGM · A100, H100/H200, B200/GB200, T4, A10, L4
AMD: amdsmi · MI300X, MI325X
Intel Gaudi: hl-smi · Gaudi 2, Gaudi 3

Workload Orchestration

Kubernetes: pod · namespace · deployment · node · cluster · cloud metadata
Slurm: job ID · user · account · partition · QoS
Bare metal: process-level attribution, systemd service

Inference Engines

vLLM

SGLang

TGI

Triton

NVIDIA NIM

Per-engine GPU metrics, not just per-device aggregates

What's included

Useful on day one.
Not after three weeks of setup.

17 pre-built alert rules

Across 3 PrometheusRule CRDs — GPU temperature, throttling, ECC errors, XID events, and idle utilization. Pod and namespace appear on every fired alert via k8sprocessor enrichment.
Grafana dashboards

Multi-cluster fleet view, per-pod workload attribution, DCGM profiling, inference engine breakdown, and health/reliability panels.
XID + ECC at job level

When XID errors increase on gpu03, you know which Slurm job or Kubernetes pod was running. Not just which node.
GPU chargeback, ready to query

Team A consumed 340 GPU-hours in April. Team B consumed 60. Every metric labeled with namespace and deployment so cost queries are trivial.
Works with your existing stack

OTLP out. Prometheus, Grafana Cloud, Datadog, any OTLP-compatible backend. Nothing proprietary, no lock-in — and a one-config path to Last9 if you want it.
Send to Last9 →
Derived from Meta's GCM

Built on the same foundation Meta uses for monitoring hundreds of thousands of GPUs in production AI research clusters.

Install

Running in 60 seconds.

One Helm install for Kubernetes. One pip install for bare metal. Metrics start flowing to your OTLP endpoint immediately.

View full docs on GitHub

Sending GPU metrics to Last9? Follow the GPU Telemetry integration guide for the OTLP endpoint, auth headers, and dashboard import.

chart-v0.2.1

kubectl create secret generic l9gpu-otlp \
  -n monitoring \
  --from-literal=OTEL_EXPORTER_OTLP_ENDPOINT=https://your-backend/v1/metrics

helm repo add l9gpu https://last9.github.io/gpu-telemetry

helm install l9gpu l9gpu/l9gpu \
  -n monitoring \
  --create-namespace \
  --set otlpSecretName=l9gpu-otlp

pip install l9gpu

export OTEL_EXPORTER_OTLP_ENDPOINT=https://your-backend/v1/metrics

l9gpu nvml_monitor \
  --sink otel \
  --cluster my-cluster

After install you get:

Per-pod GPU utilization in Prometheus, labeled immediately
17 alert rules active — no PromQL to write
Grafana dashboard importable with one click
Slurm job → GPU attribution visible in 1 collection cycle

What you can finally answer

Questions finance and engineering leads actually ask.

Which team consumed the most GPU hours this month?
Which pod caused that utilization spike at 2am?
What was running on gpu03 when XID errors fired?
Which vLLM instance is burning GPU without serving requests?
Is our H100 utilization 5% because of idle GPUs or bad workload scheduling?
Which Slurm job account should we bill for this training run?

Stop guessing which pod is burning your GPU budget.

MIT licensed. One DaemonSet. Metrics with workload identity in under 60 seconds.

View on GitHub Install now

Start observing for free. No lock-in.

Book demo

GPU at 80%.
But which pod?

You have Prometheus.
You have Grafana.
You have DCGM.
You still can't answer this question.

One agent. Attribution at collection time.

Works wherever your GPUs are.

GPU Hardware

Workload Orchestration

Inference Engines

Useful on day one.
Not after three weeks of setup.

17 pre-built alert rules

Grafana dashboards

XID + ECC at job level

GPU chargeback, ready to query

Works with your existing stack

Derived from Meta's GCM

Running in 60 seconds.

Questions finance and engineering leads actually ask.

Stop guessing which pod is burning your GPU budget.

Start observing for free. No lock-in.

OpenTelemetry · Prometheus

Datadog · New Relic · Others

Built on Open Standards

GPU at 80%. But which pod?

You have Prometheus.You have Grafana.You have DCGM. You still can't answer this question.

One agent. Attribution at collection time.

Works wherever your GPUs are.

GPU Hardware

Workload Orchestration

Inference Engines

Useful on day one.Not after three weeks of setup.

17 pre-built alert rules

Grafana dashboards

XID + ECC at job level

GPU chargeback, ready to query

Works with your existing stack

Derived from Meta's GCM

Running in 60 seconds.

Questions finance and engineering leads actually ask.

Stop guessing which pod is burning your GPU budget.

Start observing for free. No lock-in.

OpenTelemetry · Prometheus

Datadog · New Relic · Others

Built on Open Standards

GPU at 80%.
But which pod?

You have Prometheus.
You have Grafana.
You have DCGM.
You still can't answer this question.

Useful on day one.
Not after three weeks of setup.