All articles on Llm ⏤ Last9

last9-genai: Closing the Conversation Gap in LLM Observability

OpenTelemetry's GenAI instrumentation gives you spans and token counts. It does not give you conversations, workflow cost rollups, or prompts visible in your dashboard. last9-genai is an OTel extension that fills those three gaps — without replacing your existing observability stack.

Read

Prathamesh Sonpatki

Apr 28, 2026

10,000 GPUs, One TSDB: Cardinality at GPU Scale

1,000 nodes × 8 GPUs × 60 metrics = 1.4M time series - before you add pod names or Slurm job IDs. GPU monitoring is a cardinality problem disguised as a metrics problem. How to design for it before production OOMs your Prometheus.

Read

Shekhar

Apr 21, 2026

From GPU Silicon to Business Metrics: The 8 Layers of GPU Observability

GPU observability isn't one thing - it's eight connected layers from silicon to cost. See why correlation across layers is what cuts debugging from 2 hours to 2 minutes, and why most teams instrument only one or two

Read

Shekhar

Apr 21, 2026

The GPU Metrics That Actually Matter

Most teams monitor three GPU metrics - utilization, temperature, memory. There are 50+ that matter, and the ones you skip cause your worst outages. A vendor-neutral guide across NVIDIA, AMD, and Intel Gaudi

Read

Shekhar

Apr 20, 2026

Your LLM Is Slower Than You Think

60% GPU utilization and 3-second response times? GPU utilization is the wrong signal for LLM inference. Here's why TTFT, KV-cache pressure, and queue depth - not utilization - predict user-facing latency.

Read

Shekhar

Apr 19, 2026

Predicting GPU Failures Before They Cost You

Predict GPU hardware failures 48–72 hours in advance. A guide to the five rate-based signals — ECC error trends, XID events, thermal ramp, row remap exhaustion, PCIe downtraining — and how to combine them into a composite health score.

Read

Shekhar

Apr 18, 2026

Every Token Has a Price: Per-Request GPU Cost Attribution

Flat per-token pricing is wrong by 10–50× per request. Prefill vs decode, batch sharing, and cache effects break the math. How to attribute real GPU cost - compute, energy, and dollars - to each inference request.

Read

Shekhar

Apr 17, 2026

LLM Observability: Importance, Best Practices, and Steps

What is LLM Observability? A Complete Guide (with OpenTelemetry)

> LLM observability is the practice of tracking what goes into an LLM, what comes out, and everything in between — latency, token usage, errors, and model behavior — so you can debug, optimize, and trust AI applications in production.

Read

Prathamesh Sonpatki

Dec 5, 2024