Recently, our Developer Evangelist Prathamesh Sonpatki gave a talk at a ClickHouse meetup titled "Less War, More Room: Confessions of a Reformed Alert Hoarder." As he described the all-too-familiar "3 AM War Room Bingo," there were knowing nods throughout the audience.
His presentation sparked further conversations with customers that I wanted to share as we explore how observability challenges impact teams across the industry.
Listen to Prathamesh's talk from the ClickHouse meetup
The 3 AM War Room Reality
Prathamesh opened his talk with a scenario all engineering teams have lived through: the dreaded 3 AM alert.
First, you dismiss it as "probably just a blip." Then, as more alerts arrive, you find yourself asking, "What changed?" Soon, you're running increasingly desperate commands while Slack notifications pile up.
By 3:30 AM, five people are on the emergency call, and someone inevitably suggests that Kubernetes is either the problem or the solution (depending on your current infrastructure). This isn't just an inconvenient awakening – it's a signal that your observability approach has fundamental gaps.
Descending into "Patal Log"
"Patal Log" perfectly captures the logging hell that teams unwittingly create for themselves. For those unfamiliar, it's a wordplay on Patal Lok, referring to the netherworld or underworld in South Asian cosmology.
Even in conversations with folks after the talk, I heard this same pain described repeatedly, as we've time and time again – the gap between logging ideals and reality:
Log Everything (Ideal):
- Structured, consistent logging across services
- High-cardinality data that adds meaningful context
- Events that tell a coherent story about system behavior
Log Anything (Reality):
- Random
console.log("here")
statements scattered throughout codebases - Unstructured text that's nearly impossible to parse
- Non-thoughtful severity levels that create noise
This divide in what you really need and what it ends up being invariably leads to future you cursing past you at 3 AM when you can't find what you need in that underworld of meaningless log entries.
The Observability Gap
Prathamesh highlighted how the above gap is not only reflected but also exaggerated in current observability practices versus what teams need.
Current Reality:
- Multiple dashboards across different tools
- Telemetry data scattered across systems
- Configuration sprawl that no one fully understands
- Alert fatigue from noisy notifications
What Teams Actually Need:
- A single source of truth for operational data
- Unified telemetry across metrics, logs, and traces
- Centralized control over data processing
- Clear signal that cuts through the noise
Teams end up with different monitoring tools for various reasons — and many times, it's also because they're optimizing for cost and splitting telemetry across tools based on criticality.
This proliferation of tools without integration creates what Prathamesh aptly called "operation silos". So many tools, and yet are struggling to quickly identify the root cause during incidents.

Breaking the Alert Hoarding Cycle
What became clear with Prathamesh's talk is that most teams are stuck in a vicious cycle of alert hoarding: insufficient visibility leads to more alerts, which create more noise, which makes underlying issues harder to see, which leads to even more alerts as compensation.
We've observed this pattern repeatedly across organizations of all sizes. They know their observability is broken but don't see a practical path forward that doesn't involve rewriting applications or undergoing massive organizational change.
Technology Foundations Make a Difference
At Last9, our telemetry data platform leverages technologies like OpenTelemetry and ClickHousethe as part of its foundation. Prathamesh touched on these technologies in his talk, particularly in the context of the ClickHouse meetup.
OpenTelemetry's standardization capabilities and ClickHouse's performance characteristics offer powerful building blocks for modern observability. OTel adoption has been ramping up rapidly and is now the second most active CNCF project — it allows teams to be vendor-neutral, brings in standardization by using the same agent across sources, and enables telemetry correlation.
ClickHouse, used by teams at Cloudflare, Spotify, Lyft, and more, is one of the best data stores when it comes to speed and performance per dollar — engines for handling different telemetry types, control over the schemas, and native SQL support — makes it a great option to bring metrics, logs, and traces into one place.
However, as we've seen with customers, these technologies alone don't solve the alert hoarding problem if teams still can't easily process and transform their existing telemetry data.
The Control Plane: Where Transformation Happens
Connecting back to "what teams actually need," the most engaging portion of Prathamesh's talk focused on the concept of a Control Plane for observability data — the layer that enhances, routes, and processes telemetry in transit without requiring application or instrumentation changes.
In subsequent customer discussions, I've repeatedly heard how transformative this approach has been. Here are the Last9 Control Plane capabilities that teams have found most valuable:
Extract & Remap
A media streaming customer recently shared how they transformed and standardized their CDN logs by extracting tenant IDs from request paths of one source and query parameters of another at the Control Plane level.
This made tenant-specific monitoring possible without modifying their CDN configuration or application code, allowing them to identify and address customer-specific issues before they became widespread.
Drop & Filter
Another customer discovered that a significant logging volume consisted of DEBUG-level logs that were rarely queried but were costing them significantly in storage and processing.
By implementing filtering at the control plane, they maintained the ability to re-enable these logs when needed while dramatically reducing their baseline costs and noise.
Forward & Rehydrate
Compliance requirements often force customers to retain certain logs for years, but their previous observability solution made this prohibitively expensive.
The ability to automatically forward specific data to cold storage while maintaining the option to rehydrate it when needed allowed them to meet compliance requirements without compromising their operational visibility or budget.
Context & Correlation
Perhaps the most powerful capability, as Prathamesh emphasized, is seeing "what changed" alongside symptoms. A customer shared that before implementing a control plane approach, their average incident resolution time was 97 minutes.
After gaining the ability to standardize telemetry and extract its attributes with Last9 and correlate metrics with Change Events like deployment events, configuration changes, and infrastructure scaling, they reduced that to 24 minutes — a 75% improvement.
Meeting Teams Where They Are
Never underestimate how important workflow compatibility is to successful observability. The best technical solution fails if it doesn't fit how teams actually work.
Take this example of a typical customer with three distinct technical teams:
- Their SRE team lives in Grafana dashboards they'd built over years
- Their backend team prefers SQL-based analysis
- Their frontend team thinks in terms of user journeys and request flows
Without the flexibility of multiple approaches that each team requires, any previous attempts to consolidate on a single observability tool will have failed because each team would lose capabilities they considered essential.
Last9 supports multiple interfaces (both a native UI and an embedded Grafana) and query languages (SQL, PromQL, LogQL, and TraceQL ), making it easier to achieve unified observability without workflow disruption.
From War Rooms to Restful Nights
The journey from alert hoarding to intentional observability isn't about achieving perfection — it's about having just enough of the right information when it matters. It's about creating more room for thoughtful analysis and less war-room firefighting.
By breaking down operational silos and building bridges between disparate data sources, teams are finding they can understand what changed, why it matters, and how to fix it — often before anyone gets paged at 3 AM.
And that's a transformation worth making.