On October 2, 2025, Gartner published its report “Cool Vendors in AI for SRE and Observability.” The report identifies vendors “using AI to enhance site reliability engineering (SRE) practices, improve reliability, and reduce the cognitive load on engineering teams.”
We’re honored that Last9 has been named a Gartner Cool Vendor in this category — recognized for our unified telemetry platform and agentic SDK that move engineering teams from reactive monitoring to proactive operations.
Act 1 — Fix the Foundation: Unified Data at Scale
You can’t build intelligent operations on fragmented, expensive and limited low cardinality data. We learned this early in 2020 building the first intelligent platform on top of existing data systems — it simply did not work. Teams were observing 25% of production with 100% of the budget and had no headroom for improving data fidelity. That’s when we dug deep to first make telemetry data affordable at scale.
In the last year, cloud and AI-native teams betting on scale have converged on a single telemetry fabric. What they’ve discovered is simple: when data stops fragmenting, reliability gets affordable.
Today, that fabric powers more than a trillion events every day — the quiet backbone behind their velocity.
For us: cheap, unified data is table stakes. The real breakthrough comes when that data becomes actionable.
Observability is only one side of the coin. Action on top of that data is the other — and it’s barely evolved in a decade. It couldn’t, because the building blocks were missing. With the rise of LLMs, context engineering, and the unified telemetry layer from Act 1, that evolution can finally begin.
Act 2 — Agentic Ops: From Automation to Autonomy
Observability is only one side of the coin. Action based on that data is the other — yet it has seen little real progress in the past decade. The reason was structural: the primitives for context, correlation, cardinality and unified telemetry simply didn’t exist.
With the rise of large language models and context engineering — and the unified data layer established in Act 1 — those missing pieces are finally in place. That’s exactly where we’ve focused.
Three months ago, we quietly gave a few customers early access to something we’d been building: an agents SDK that lets engineering teams program their own Ops Agents.
These aren’t chatbots that summarize logs. They’re programmable agents that embed telemetry intelligence directly into your delivery pipeline — across detection, investigation, and remediation.
Here’s what teams have built with it:
- RCA Memory: Builds a per-service fault library; on a new incident, surfaces the most similar past fix with confidence and the exact commands/PR that resolved it.
- Auto-remediation agents: Detect cascading failures early and trigger predefined runbooks. Especially for databases like Postgres, Redis, and MySQL.
- Control Plane AI in front of storage and turns raw telemetry into reliable features. It drops what’s redundant, reshapes high-cardinality fields, and up-levels logs into metrics — all by policy and feedback loops, with explainable diffs. The result: 40–70% lower storage, faster queries, and cleaner signals for your agents — without breaking observability use-cases.
These agents learn from the same telemetry graph that powers your dashboards; but they don’t stop at visibility. They act.
Why This Matters Now
Gartner projects that by 2029, 70% of organizations will require explainable AI for agentic SRE decisions, up from just 5% today. That shift — from dashboards to decisions — demands trustworthy, programmable systems.Over the past few years, building AI-SRE tools taught us three hard lessons:
- The Data Fragmentation Problem
You can’t build reliable automation on unreliable data constrained by price by GB. We solved this with a unified schema and a telemetry data platform. - The Context Problem
Generic AI models don’t understand your architecture. That’s why you program your own agents — they capture your team’s tribal knowledge. - The Trust Problem
Engineers ignore opaque automation. Our SDK surfaces confidence scores and approval gates so humans stay in control.
This isn’t AI replacing engineers — it’s AI amplifying reliability and feedback loops across every system.
About the Gartner Recognition
Last9 was named a Cool Vendor in Gartner’s October 2025 report “Cool Vendors in AI for SRE and Observability.” The report highlights vendors using AI to enhance SRE practices and reduce the cognitive load on engineering teams.