All Topics / Deep Dives

Deep Dives

Explore our deep-dive blogs for an in-depth look at various observability and reliability topics! We break down complex ideas and share valuable insights to help you understand observability and related concepts better.

Think Data Warehouse, NOT Database.

Think Data Warehouse, NOT Database.

The software monitoring world is broken because of a TSDB. We deserve a TSDW

Aniket Rao

The most important aspect of software monitoring

The most important aspect of software monitoring

Ths single most important thing to get better at your software monitoring journey

Aniket Rao

What needs to change in software monitoring?

What needs to change in software monitoring?

A wishlist of things that need to change in the world of software monitoring

Aniket Rao

How We Cut Monitoring Costs and Deprecated Thanos at Replit

How We Cut Monitoring Costs and Deprecated Thanos at Replit

Winning Replit over by taming High Cardinality data and deprecating Thanos

Prathamesh Sonpatki

Back to the Future: The R-C-A of alerting

Back to the Future: The R-C-A of alerting

Dissecting the RCA of Alerting - Reliability, Correlations, Actionability

Aditya Godbole

Launching Alert Studio

Launching Alert Studio

Modern monitoring systems depend heavily on ‘Alerting’ to reduce the Mean Time to Detect (MTTD) faulty systems. But, alerting hasn’t evolved to meet the demands of modern architectures. We’re changing that with Alert Studio.

Aditya Godbole

Everything in software monitoring is dead, apparently

Everything in software monitoring is dead, apparently

Chasing shiny new toys, as always ;)

Aniket Rao

Software Monitoring — Stuck in the 00s

Software Monitoring — Stuck in the 00s

A short history of software monitoring, from the 00s. What has changed? Why are things so arcane?

Piyush Verma

A checklist to choose a monitoring system

A checklist to choose a monitoring system

A detailed checklist of points you should consider before choosing a monitoring system

Prathamesh Sonpatki

Why your monitoring costs are high

Why your monitoring costs are high

If you want to bring down your monitoring costs, you need to shake up a decision paralysis in engineering

Aniket Rao

The unresolved cost of High Cardinality

The unresolved cost of High Cardinality

Fulfill all your food delivery orders this December 31st by taming High Cardinality data with Levitate 😉

Prathamesh Sonpatki

Why you need a Time Series Data Warehouse

Why you need a Time Series Data Warehouse

What is a Time Series Data Warehouse? How does it help in your monitoring journey? How does it differ from a Time Series Database? That and more

Rishi Agrawal

Building Logs to Metrics pipelines with Vector

Building Logs to Metrics pipelines with Vector

How to build a pipeline to convert logs to metrics and ship them to long term Prometheus storage like Levitate.

Aniket Rao

This arctic winter — time to repay your tech debt

This arctic winter — time to repay your tech debt

We're in a peak tech winter. What should engineering teams focus on when product velocity dwindles?

Ajey Gore

A case for Observability outside engineering teams

A case for Observability outside engineering teams

Observability is being built by engineers for engineers. In reality, o11y is for all.

Aniket Rao

Understanding the Rasmussen model for failures

Understanding the Rasmussen model for failures

What does the Rasmussen model teach us about Site Reliability Engineering?

Nishant Modak

1979, a nuclear accident and SRE

1979, a nuclear accident and SRE

Deep diving into the 'Normal accident' theory by Charles Perrow, and what it means for SREs

Aniket Rao

OpenTelemetry for dummies: ELI5

OpenTelemetry for dummies: ELI5

What is OpenTelemetry? Why is it important? Do SREs need to adopt OTel? An Explain It Like I'm 5.

Mohan Dutt Parashar

What Site Reliability Engineering needs — A swarm of rogue bees

What Site Reliability Engineering needs — A swarm of rogue bees

If all companies are software companies, all companies need better Observability to understand how performative their software is

Aniket Rao

Take back control of your Monitoring

Take back control of your Monitoring

Take back control of your Monitoring with Levitate - a managed time series data warehouse

Nishant Modak

Observability is a practice, not a job

Observability is a practice, not a job

Engineering organizations that ship fast have Observability as part of their core DNA.

Aniket Rao

High Cardinality for Dummies: ELI5

High Cardinality for Dummies: ELI5

High Cardinality woes are far & frequent in today's modern cloud-native environment. What does it mean, & why is it such a pressing problem?

Mohan Dutt Parashar

Who should define Reliability —  Engineering, or Product?

Who should define Reliability — Engineering, or Product?

Whoever owns Reliability should define its parameters. But who owns the Reliability of a Product? Engineering? Product Management? Or the Customer success team?

Piyush Verma

What do self-driving cars tell us about Site Reliability Engineering?

What do self-driving cars tell us about Site Reliability Engineering?

From Robocars to Reliability — SRE with self-driving cars; mapping out where the Observability space is in conjunction with self-driving cars

Mohan Dutt Parashar

Observability—OSS vs Paid vs Managed OSS

Observability—OSS vs Paid vs Managed OSS

The Reliability industry needs a managed, non-vendor lock-in answer to spiraling costs, high cardinality and the toil of managing a tsdb

Satyajeet Jadhav

High Cardinality? No Problem! Stream Aggregation FTW

High Cardinality? No Problem! Stream Aggregation FTW

Managing high cardinality in time series data is tough but crucial. Learn how Levitate’s streaming aggregations can help tackle it efficiently.

Piyush Verma

The neglected tech arctic winter — Internal SaaS expenses

The neglected tech arctic winter — Internal SaaS expenses

The current tech winter reveals a hard truth: spending on internal tools for tech infrastructure is bloated—and this isn't just a passing cycle.

Nishant Modak

Understanding “Cricket Scale”

Understanding “Cricket Scale”

How does a DevOps/Site Reliability Engineer plan for "Cricket scale"? How do you warm systems' about to witness 30+ million concurrent users?

Aniket Rao

Reliability Engineering for Dummies: ELI5

Reliability Engineering for Dummies: ELI5

Explaining Reliability Engineering to a 5-year-old.

Mohan Dutt Parashar

When should I start thinking of observability?

When should I start thinking of observability?

How does one scale metrics maturity in a cloud-native world — A guide on observability tooling as your engineering org scales.

Piyush Verma

The importance of structured communication in the world of SRE

The importance of structured communication in the world of SRE

How you communicate helps build your 9s. In the world of Site Reliability Engineering, this is crucial. How do you do it?

Saurabh Hirani

The difference between DevOps, SRE, and Platform Engineering

The difference between DevOps, SRE, and Platform Engineering

In reliability engineering, three concepts keep getting talked about - DevOps, SRE and Platform Engineering. How do they differ?

Prathamesh Sonpatki

How to improve Prometheus remote write performance at scale

How to improve Prometheus remote write performance at scale

Deep dive into how to improve the performance of Prometheus Remote Write at Scale based on real-life experiences

Saurabh Hirani

India vs Pakistan: SRE and the Shannon Limit

India vs Pakistan: SRE and the Shannon Limit

How does one ‘detect change’ in a complex infrastructure, so you don’t lose out on critical revenues — A short SRE story

Satyajeet Jadhav

Why MTTR should be a ‘business’ metric

Why MTTR should be a ‘business’ metric

A key challenge is aligning engineering health metrics with business goals. How can business measure engineering, and engineering show its value?

Sidu Ponnappa

Observability - That Last 9

Observability - That Last 9

TL;DR: A stitch in time, saves 9. A discussion on the key blocks of observability.

Akash Saxena

How we won Dukaan over

How we won Dukaan over

5 meetings. 1 month. Subhash and his team’s velocity on decision-making, moving fast, and radical candor, are a breath of fresh air in the Indian startup ecosystem.

Aniket Rao

Getting the big picture with Log Analysis

Getting the big picture with Log Analysis

How to get the most out of your logs!

Jayesh Bapu Ahire

Microservices - Tracking Dependencies

Microservices - Tracking Dependencies

Quick primer into microservices architecture and the importance of tracking dependencies

Akshay Chugh, Jayesh Bapu Ahire

Infrastructure-As-Code-As-Software

Infrastructure-As-Code-As-Software

We ran a poll on Twitter. “Do you care about the quality of your infrastructure code?” And on Reddit That’s an approximate and staggering 60–30–10 split. What do you think will the response be if the poll was — “Do you care about the quality of your product code?” Reasons We asked a follow-up question to reason why ~30% are in the Somewhat but mostly no category and gleaned these reasons from Twitter and Reddit: 1. Someone manually created the legacy infrastructure. No one questioned th

Piyush Verma