When should I start thinking of observability?

When you’re a 5-person engineering team, observability tooling is the least of your concerns. But putting it on the back burner only means prolonging the inevitable. It’s a question I get asked often,

“At which point in the maturity of an engineering org should you focus on reliability tooling?”.

“We’re flying blind for now, but diagnosing incidents is becoming harder — should I start considering tooling right away?”

I thought I’d write a whitepaper to understand how engineering organizations should build reliability and what factors to consider. And here we are.

As an engineering organization scales, your metrics grow across dimensions:

Retention
Unique metric types
Cardinality, or instances

However, modern TSDBs grow on all three axes because of the rate of ingestion and exploration. 💥

So, how do you build reliability tooling across org tiers? We’ve classified organizations into three types based on the following:

👩‍💻 Engineers
😍 Customers
📀 Services

Based on this classification, it’s easy to understand needs across teams. From my conversations, a point stood out: Data querying needs to grow plentiful across teams as organizations mature. So, this is not an engineering endeavor only. Business, Finance, Product, and Customer Success teams have what it takes to bring your Prometheus down. And these have broad ramifications for any organization.

Without further adieu, here’s a link to the whitepaper: 👇

You can also get started by signing up here.

Hit us up on Discord 😍 with any queries about Last9. You can also ping me on Twitter for brickbats, suggestions, and questions: @realmeson10.

When should I start thinking of observability?

Contents

Contents

Start observing for free. No lock-in.

OpenTelemetry · Prometheus

Datadog · New Relic · Others

Built on Open Standards