All Topics / Deep Dives
Deep Dives

This arctic winter — time to repay your tech debt
We're in a peak tech winter. What should engineering teams focus on when product velocity dwindles?
Ajey Gore

A case for Observability outside engineering teams
Observability is being built by engineers for engineers. In reality, o11y is for all.
Aniket Rao

Understanding the Rasmussen model for failures
What does the Rasmussen model teach us about Site Reliability Engineering?
Nishant Modak

1979, a nuclear accident and SRE
Deep diving into the 'Normal accident' theory by Charles Perrow, and what it means for SREs
Aniket Rao

OpenTelemetry for dummies: ELI5
What is OpenTelemetry? Why is it important? Do SREs need to adopt OTel? An Explain It Like I'm 5.
Mohan Dutt Parashar

What Site Reliability Engineering needs — A swarm of rogue bees
If all companies are software companies, all companies need better Observability to understand how performative their software is
Aniket Rao

Take back control of your Monitoring
Take back control of your Monitoring with Levitate - a managed time series data warehouse
Nishant Modak

Observability is a practice, not a job
Engineering organizations that ship fast have Observability as part of their core DNA.
Aniket Rao

High Cardinality for Dummies: ELI5
High Cardinality woes are far & frequent in today's modern cloud-native environment. What does it mean, & why is it such a pressing problem?
Mohan Dutt Parashar

Who should define Reliability — Engineering, or Product?
Whoever owns Reliability should define its parameters. But who owns the Reliability of a Product? Engineering? Product Management? Or the Customer success team?
Piyush Verma

What do self-driving cars tell us about Site Reliability Engineering?
From Robocars to Reliability — SRE with self-driving cars; mapping out where the Observability space is in conjunction with self-driving cars
Mohan Dutt Parashar

Observability—OSS vs Paid vs Managed OSS
The Reliability industry needs a managed, non-vendor lock-in answer to spiraling costs, high cardinality and the toil of managing a tsdb
Satyajeet Jadhav

High Cardinality? No Problem! Stream Aggregation FTW
High cardinality in time series data is challenging to manage. But it is necessary to unlock meaningful answers. Learn how streaming aggregations can rein in high cardinality using Levitate.
Piyush Verma

The neglected tech arctic winter — Internal SaaS expenses
The current tech winter has a number of glaring stories — cyclical as they may be, there’s one truth that’s been gleaned over more than the rest; the money spent on internal software tools to support tech infrastructure is bloated. And there’s nothing cyclical about this infrastructure spending.
Nishant Modak

Understanding “Cricket Scale”
How does a DevOps/Site Reliability Engineer plan for "Cricket scale"? How do you warm systems' about to witness 30+ million concurrent users?
Aniket Rao

Reliability Engineering for Dummies: ELI5
Explaining Reliability Engineering to a 5-year-old.
Mohan Dutt Parashar

When should I start thinking of observability?
How does one scale metrics maturity in a cloud-native world — A guide on observability tooling as your engineering org scales.
Piyush Verma

The importance of structured communication in the world of SRE
How you communicate helps build your 9s. In the world of Site Reliability Engineering, this is crucial. How do you do it?
Saurabh Hirani

The difference between DevOps, SRE, and Platform Engineering
In reliability engineering, three concepts keep getting talked about - DevOps, SRE and Platform Engineering. How do they differ?
Prathamesh Sonpatki

How to improve Prometheus remote write performance at scale
Deep dive into how to improve the performance of Prometheus Remote Write at Scale based on real-life experiences
Saurabh Hirani

India vs Pakistan, Site Reliability Engineering, and Shannon Limit
How does one ‘detect change’ in a complex infrastructure, so you don’t lose out on critical revenues — A short SRE story
Satyajeet Jadhav

Why MTTR should be a ‘business’ metric
One of the many pitfalls of friction between engineering and business is the lack of fundamental measurements on the health of engineering. But how does business measure engineering efficacy, and how does engineering posit its standing to business?
Sidu Ponnappa

Observability - That Last 9
TL;DR: A stitch in time, saves 9. A discussion on the key blocks of observability.
Akash Saxena

How we won Dukaan over
5 meetings. 1 month. From introductions, to a demo, and ultimately winning Dukaan over. Subhash and his team’s velocity on decision-making, moving fast, and radical candor, is a breath of fresh air in the Indian startup ecosystem.
Aniket Rao
Getting the big picture with Log Analysis
How to get the most out of your logs!
Jayesh Bapu Ahire

Microservices - Tracking Dependencies
Quick primer into microservices architecture and the importance of tracking dependencies
Akshay Chugh, Jayesh Bapu Ahire

Infrastructure-As-Code-As-Software
We ran a poll on Twitter. “Do you care about the quality of your infrastructure code?” And on Reddit That’s an approximate and staggering 60–30–10 split. What do you think will the response be if the poll was — “Do you care about the quality of your product code?” Reasons We asked a follow-up question to reason why ~30% are in the Somewhat but mostly no category and gleaned these reasons from Twitter and Reddit: 1. Someone manually created the legacy infrastructure. No one questioned t
Piyush Verma