Last9 Named a Gartner® Cool Vendor in AI for SRE and Observability
Gartner recognizes Last9 in their latest Cool Vendor report for unified telemetry and agentic SDK—moving teams from reactive monitoring to proactive ops.
Nishant Modak
Fixing Broken Traces in GCP Cloud Run: A Custom OpenTelemetry Propagator
GCP's load balancer silently rewrites your traceparent header, orphaning spans in any OTLP backend. Here's the custom propagator that fixes it.
Prathamesh Sonpatki
Why Your PromQL Availability Query Returns Nothing When Services Are Healthy
Your SLI query shows 100% availability as No Data. Here's why PromQL returns empty results instead of zero — and the label-preserving fix.
Prathamesh Sonpatki
Instrumenting WordPress with OpenTelemetry: PHP Tracing, Browser RUM, and Error Capture in Production
WordPress powers 40% of the web but has no native observability story. Here's how to instrument it end-to-end with OpenTelemetry - PHP, browser RUM, and errors.
Prathamesh Sonpatki
10,000 GPUs, One TSDB: Cardinality at GPU Scale
1,000 nodes × 8 GPUs × 60 metrics = 1.4M time series - before you add pod names or Slurm job IDs. GPU monitoring is a cardinality problem disguised as a metrics problem. How to design for it before production OOMs your Prometheus.
Shekhar
From GPU Silicon to Business Metrics: The 8 Layers of GPU Observability
GPU observability isn't one thing - it's eight connected layers from silicon to cost. See why correlation across layers is what cuts debugging from 2 hours to 2 minutes, and why most teams instrument only one or two
Shekhar
The GPU Metrics That Actually Matter
Most teams monitor three GPU metrics - utilization, temperature, memory. There are 50+ that matter, and the ones you skip cause your worst outages. A vendor-neutral guide across NVIDIA, AMD, and Intel Gaudi
Shekhar
Your LLM Is Slower Than You Think
60% GPU utilization and 3-second response times? GPU utilization is the wrong signal for LLM inference. Here's why TTFT, KV-cache pressure, and queue depth - not utilization - predict user-facing latency.
Shekhar
Predicting GPU Failures Before They Cost You
Predict GPU hardware failures 48–72 hours in advance. A guide to the five rate-based signals — ECC error trends, XID events, thermal ramp, row remap exhaustion, PCIe downtraining — and how to combine them into a composite health score.
Shekhar
Every Token Has a Price: Per-Request GPU Cost Attribution
Flat per-token pricing is wrong by 10–50× per request. Prefill vs decode, batch sharing, and cache effects break the math. How to attribute real GPU cost - compute, energy, and dollars - to each inference request.
Shekhar