Why your monitoring costs are high

After three years of being obsessed with the DevOps and SRE space, I can now confidently say that companies are paying more for their monitoring costs than production-level code.

I wasn’t confident about this before Last9, thinking there were anomalies and it probably seemed a tad outrageous. But now, I’m near certain about this; more so for large companies who have struggled to keep up with the Digital Transformation wave. And for the ones who have migrated from legacy systems, they have been hit with containerization, Kubernetes and a microservices nightmare that have only increased overall monitoring costs.

This short rant is being written after a conversation with an industry veteran who joked to me, “…maybe I should create an AI company that will port this microservices mess to a monolith and make millions of dollars, because monoliths are so back baby. After all, monitoring is anyway a massive mess and no one wants to touch it”.

If you’re cued into all things tech, this would seem funny at first, but if you remember how Amazon’s Prime video team did something similar to save costs, it’d probably get you thinking. 😛

For those wondering what I’m talking about, the Prime team moved from microservices to a monolith and reduced costs by about 90%. It’s riveting stuff. Read here.

As a system becomes more complex and more diverse, monitoring costs are the biggest problem to tackle. And yet, the status quo is rampant. But why aren’t enough people jumping ship and making important decisions on these exorbitant costs?

The IBM doctrine

No one ever got fired for buying an IBM. The tech industry adage has stood the test of time, and it’s particularly entrenched in the DevOps world. This is also precisely why companies like DataDog are able to post massive earnings and continue doing so well. After all, migrations are hard, and the decision paralysis to do this is what keeps monitoring costs at all-time highs.

Transferring historical data and configurations from a locked-in vendor might seem like a daunting task, but if a company’s monitoring costs are more than production-level code, then your tech stack is working against you.

The ‘Open Source is cheap’ fallacy

On the flip side of the IBM doctrine is the fallacy that ‘Open Source is cheap’. This couldn’t be further away from the truth. For startups who haven’t witnessed scale, this is true, but for large companies or even startups that are witnessing a spurt in growth, open-source monitoring tools present a plethora of challenges, and it’s not just costs.

My CTO captures this well 👇

Open Source is not cheap in #Observability.
Most folks simply forget to add engineering salaries, time spent, and the toil associated with managing a tsdb and mapping #o11y for an org.
— Piyush Verma (@realmeson10) October 3, 2023

The biggest lie ever sold in the monitoring world is that in-house monitoring means things are cheap. Well, it’s not.

The cost of managing your own Prometheus comes with engineering overheads and toil that’s usually not accounted for. And when you have churn in teams, knowledge transfers get far more tricky given how much of monitoring is run with tribal knowledge.

Outsource the headache

Good Monitoring needs technical expertise. An organization needs to keep up-to-date with innovation in the space. But given how complex this problem is, why manage this mess internally? Why are dedicated teams doing this when an engineering team should be focusing on the main product?

We don’t manage server racks for storage, so why are people focusing on doing this for monitoring? Leave these to the experts.

It all boils down to figuring out how to tame High Cardinality data for a chunk of large organizations. This is one of the primary reasons why costs are high and monitoring is weak. Because you drop a label because of cardinality and then realize that was actually critical.

Outsource this problem to the experts who focus on taming High Cardinality. There are companies and teams dedicated to resources who are focusing on taming High Cardinality as a full-time job!

If none of this resonates or makes sense, hit me up; I don’t mind building an AI tool that can potentially migrate an org’s microservices to monoliths 😛 I’m kidding. Or maybe not.

Active on Twitter/X mostly — @aniket_rao

If you want to understand how Levitate differs from the rest and how it can manage what we call the “Cricket Scale”, please feel free to DM me. You can also schedule a demo.

Why your monitoring costs are high

Contents

The IBM doctrine

The ‘Open Source is cheap’ fallacy

Outsource the headache

Contents

Do More with Less

Handcrafted Related Posts

Think Data Warehouse, NOT Database.

India vs Pakistan: SRE and the Shannon Limit

Who should define Reliability — Engineering, or Product?