Last9 Last9

May 3rd, ‘23 / 4 min read

Observability—OSS vs Paid vs Managed OSS

The Reliability industry needs a managed, non-vendor lock-in answer to spiraling costs, high cardinality and the toil of managing a tsdb

Observability—OSS vs Paid vs Managed OSS

The Observability space is at two extremes as of today. It’s split between Open Source Software (OSS) practitioners and the Paid locked-in vendors. They both come with their own advantages and drawbacks. What's abundantly clear is, a lack of a viable middle ground is hurting most companies. This pain comes in different forms — spiralling costs, lack of data visibility to keep systems up, and the missed opportunity for business teams to explore data, and make meaningful strategic decisions to help build the business.

The last point is particularly noteworthy. If armed with the right data, business/product teams can make meaningful decisions to understand users, observe patterns, and open up new avenues to help the bottom line. But, data exploration is so hard, most teams simply don’t get a chance to derive learnings from their own data. Not only does this affect business, it could very well be the death knell in the long term.

As one CTO put it, “Not only are the costs of data exploration so high, there’s the stark reality of how these much-needed experiments could bring my infra down. So we just defer these experiments knowing wholly well we could be sitting on a trove of insights that could very well add millions of dollars to our business.”

But, why is this so hard?

The Weltanschauung of Observability

Type Open/Blackbox Community Protocols Support Costs
OSS Multiple solutions, free to explore, choose and decide A chance to learn from the community, experiment and tweak solutions that best work for your org Open protocols No support, you’re at the whim of the community — could be disastrous during downtimes, urgent needs Cheap, easy to get, and usually works for smaller orgs
Paid Blackbox — At the behest of a vendor Complete vendor lock-in. You will get what is being offered Proprietary protocols Complete support to debug issues, but comes at a cost. Costs spiral out of control, hampering multiple teams,

The more you look at the above table, you’ll realize how polarised current options are. This chasm has given birth to a whole legion of companies focusing on Observability. The industry has exploded with startups trying to dethrone the incumbents. DataDog, at a market cap of about $22 billion leads the space, but there are half a dozen listed companies, and more than two dozen chipping at that behemoth.

💡 There’s also an interesting shift from companies moving out of typical Application Performance Management (APM) to looking at service-level monitoring. This fundamental shift in thinking radically changes how Observability practices are being defined.

There needs to be a managed non-lock-in answer to this problem.

"Let’s build it." — The Tax of Truck Factors

I’ve missed calling out a dying third set of folks in this space; the ones who attempt to build their own Observability tool kit. Over the course of time, many have tried, most have failed. I’ve even worked with a bunch of large and small companies that attempted to do this, but have mostly failed.

💡 There was a time when folks were discussing build vs. buy for cloud storage. Now, that conversation is redundant. Time series storage is at that point — it makes no sense to maintain your own time series database.

Apart from the most obvious reasons, the biggest one is the prohibitive costs of doing this yourself. You have to hire multiple engineers given Truck factors associated with its upkeep. Once you do, there’s the tax of understanding, learning, and keeping yourself up to date on how someone has envisioned and built this Observability tool. That tax comes due one day. 😉

I remember a fairly large company building its own tooling over 14 months. The project head left mid-way but was salvaged because it was a 6-person team. Then it became a 3-person team. Another lead comes in and changes the whole architecture and design. 2 more leave…

The hardest part was about how mid-way, folks realized that a few things were outdated. Keeping up with industry innovation became a harder problem. Some of the more bespoke integrations needed to be iterated multiple times. The lack of awareness and knowledge was crippling over time. Ultimately the project was abandoned because of the sheer costs of doing it.

The sweet spot — What next?

Two important factors sum up the chasm between OSS and Paid vendors in my mind:

  1. Costs
  2. Support

The conventional pricing model for Observability tooling needs to change. The costs are so prohibitive at scale, it dissuades most organizations from taking bold calls. This is the single most important mind in all my conversations. There’s no alternative, and the ones that are there, are priced out of the equation at scale.

Support for Observability tooling is poor in most cases. The lack of knowledge, and awareness means tools fall short of customer needs. This works both ways in all fairness. The cost of engineering support also comes at a price, and this hampers better resolution time for both parties. Having multiple browser tabs, poor UI/UX, and the general taxonomy of how support is being offered needs a rethink.

Dare I say, this is what we’re trying to solve at Last9 — a better alternative to Observability tooling, with stellar support and the ability to manage scale.

The middle ground looks like this:

Type Open/Blackbox Community Protocols Support Costs
The Ideal managed Observability partner Managed service with optionality — inter-operable Can continue plugging OS tools since there are no vendor lock-ins Open protocols compatible — can slice and dice from best practices, open innovation Offers full support, given it’s a managed service. Costs are baked in beforehand Cheap, flexible, and works on cutting costs, because you’re not being charged to store/query more data.

CTOs need a managed and Open platform for their Reliability mandate. This keeps vendors on their toes, negates an org’s truck factors, reins costs, and fundamentally alters how Observability is being built in an org.


The obvious plug: We reduce your Total Cost of Ownership around everything to do with Observability by about 50%. If you’re looking for alternatives, please chat with us here.

Want to know more about Last9 and our products? Check out last9.io; we’re building Reliability tools to make running systems at scale, fun, and embarrassingly easy.

Contents


Newsletter

Stay updated on the latest from Last9.