Deep dives illustration

Deep dives

All articles tagged 'Deep dives'

Think Data Warehouse, NOT Database.

Think Data Warehouse, NOT Database.

The software monitoring world is broken because of a TSDB. We deserve a TSDW

Read
Aniket Rao

Aniket Rao

The most important aspect of software monitoring

The most important aspect of software monitoring

Ths single most important thing to get better at your software monitoring journey

Read
Aniket Rao

Aniket Rao

What needs to change in software monitoring?

What needs to change in software monitoring?

A wishlist of things that need to change in the world of software monitoring

Read
Aniket Rao

Aniket Rao

How We Cut Monitoring Costs and Deprecated Thanos at Replit

How We Cut Monitoring Costs and Deprecated Thanos at Replit

Winning Replit over by taming High Cardinality data and deprecating Thanos

Read
Prathamesh Sonpatki

Prathamesh Sonpatki

Back to the Future: The R-C-A of alerting

Back to the Future: The R-C-A of alerting

Dissecting the RCA of Alerting - Reliability, Correlations, Actionability

Read
Aditya Godbole

Aditya Godbole

Launching Alert Studio

Launching Alert Studio

Modern monitoring systems depend heavily on ‘Alerting’ to reduce the Mean Time to Detect (MTTD) faulty systems. But, alerting hasn’t evolved to meet the demands of modern architectures. We’re changing that with Alert Studio.

Read
Aditya Godbole

Aditya Godbole

Everything in software monitoring is dead, apparently

Everything in software monitoring is dead, apparently

Chasing shiny new toys, as always ;)

Read
Aniket Rao

Aniket Rao

Software Monitoring — Stuck in the 00s

Software Monitoring — Stuck in the 00s

A short history of software monitoring, from the 00s. What has changed? Why are things so arcane?

Read
Piyush Verma

Piyush Verma

A checklist to choose a monitoring system

A checklist to choose a monitoring system

A detailed checklist of points you should consider before choosing a monitoring system

Read
Prathamesh Sonpatki

Prathamesh Sonpatki

Why your monitoring costs are high and how you can reduce them with Levitate

Why your monitoring costs are high

If you want to bring down your monitoring costs, you need to shake up a decision paralysis in engineering

Read
Aniket Rao

Aniket Rao

Deliver all your orders this December 31st 😉

The unresolved cost of High Cardinality

Fulfill all your food delivery orders this December 31st by taming High Cardinality data with Levitate 😉

Read
Prathamesh Sonpatki

Prathamesh Sonpatki

A Time Series Data Warehouse vs A Time Series Database

Why you need a Time Series Data Warehouse

What is a Time Series Data Warehouse? How does it help in your monitoring journey? How does it differ from a Time Series Database? That and more

Read
Rishi Agrawal

Rishi Agrawal

Building Logs to Metrics pipelines with Vector

Building Logs to Metrics pipelines with Vector

How to build a pipeline to convert logs to metrics and ship them to long term Prometheus storage like Levitate.

Read
Aniket Rao

Aniket Rao

Repaying your tech debt during the tech arctic winter

This arctic winter — time to repay your tech debt

We're in a peak tech winter. What should engineering teams focus on when product velocity dwindles?

Read
Ajey Gore

Ajey Gore

A case for Observability outside engineering teams

A case for Observability outside engineering teams

Observability is being built by engineers for engineers. In reality, o11y is for all.

Read
Aniket Rao

Aniket Rao

Understanding the Rasmussen model for failures

Understanding the Rasmussen model for failures

What does the Rasmussen model teach us about Site Reliability Engineering?

Read
Nishant Modak

Nishant Modak

1979, a nuclear accident and SRE

1979, a nuclear accident and SRE

Deep diving into the 'Normal accident' theory by Charles Perrow, and what it means for SREs

Read
Aniket Rao

Aniket Rao

What Site Reliability Engineering needs — A swarm of rogue bees

What Site Reliability Engineering Needs: A Swarm of Bees

If all companies are software companies, all companies need better Observability to understand how performative their software is

Read
Aniket Rao

Aniket Rao

Take back control of your Monitoring with Levitate

Take back control of your Monitoring

Take back control of your Monitoring with Levitate - a managed time series data warehouse

Read
Nishant Modak

Nishant Modak

Observability is a practice, not a job

Observability is a practice, not a job

Engineering organizations that ship fast have Observability as part of their core DNA.

Read
Aniket Rao

Aniket Rao

Who should define Reliability —  Engineering, or Product

Who should define Reliability — Engineering, or Product?

Whoever owns Reliability should define its parameters. But who owns the Reliability of a Product? Engineering? Product Management? Or the Customer success team?

Read
Piyush Verma

Piyush Verma

What do self-driving cars tell us about Site Reliability Engineering?

What do self-driving cars tell us about Site Reliability Engineering?

From Robocars to Reliability — SRE with self-driving cars; mapping out where the Observability space is in conjunction with self-driving cars

Read
Mohan Dutt Parashar

Mohan Dutt Parashar

OSS vs Paid vs Managed OSS — Picking what works for your Observability journey

Observability—OSS vs Paid vs Managed OSS

The Reliability industry needs a managed, non-vendor lock-in answer to spiraling costs, high cardinality and the toil of managing a tsdb

Read
Satyajeet Jadhav

Satyajeet Jadhav

The neglected tech arctic winter — Internal SaaS expenses

The neglected tech arctic winter — Internal SaaS expenses

The current tech winter reveals a hard truth: spending on internal tools for tech infrastructure is bloated—and this isn't just a passing cycle.

Read
Nishant Modak

Nishant Modak

What does "Cricket scale" mean for a Site Reliability Engineer?

Understanding “Cricket Scale”

How does a DevOps/Site Reliability Engineer plan for "Cricket scale"? How do you warm systems' about to witness 30+ million concurrent users?

Read
Aniket Rao

Aniket Rao

Reliability Engineering for Dummies: ELI5

Reliability Engineering for Dummies: ELI5

Explaining Reliability Engineering to a 5-year-old.

Read
Mohan Dutt Parashar

Mohan Dutt Parashar

Complete Organizational Intelligence

When should I start thinking of observability?

How does one scale metrics maturity in a cloud-native world — A guide on observability tooling as your engineering org scales.

Read
Piyush Verma

Piyush Verma

The importance of structured communication in the world of SRE

The importance of structured communication in the world of SRE

How you communicate helps build your 9s. In the world of Site Reliability Engineering, this is crucial. How do you do it?

Read
Saurabh Hirani

Saurabh Hirani

The difference between DevOps, SRE, and Platform Engineering

The difference between DevOps, SRE, and Platform Engineering

In reliability engineering, three concepts keep getting talked about - DevOps, SRE and Platform Engineering. How do they differ?

Read
Prathamesh Sonpatki

Prathamesh Sonpatki

How to improve Prometheus remote write performance at scale

How to improve Prometheus remote write performance at scale

Deep dive into how to improve the performance of Prometheus Remote Write at Scale based on real-life experiences

Read
Saurabh Hirani

Saurabh Hirani

India vs Pakistan: SRE and the Shannon Limit

India vs Pakistan: SRE and the Shannon Limit

How does one ‘detect change’ in a complex infrastructure, so you don’t lose out on critical revenues — A short SRE story

Read
Satyajeet Jadhav

Satyajeet Jadhav

Why MTTR should be a ‘business’ metric

Why MTTR should be a ‘business’ metric

A key challenge is aligning engineering health metrics with business goals. How can business measure engineering, and engineering show its value?

Read
Sidu Ponnappa

Sidu Ponnappa

Observability - That Last 9

Observability - That Last 9

TL;DR: A stitch in time, saves 9. A discussion on the key blocks of observability.

Read
Akash Saxena

Akash Saxena

How we won Dukaan over

How we won Dukaan over

5 meetings. 1 month. Subhash and his team’s velocity on decision-making, moving fast, and radical candor, are a breath of fresh air in the Indian startup ecosystem.

Read
Aniket Rao

Aniket Rao

Getting the big picture with Log Analysis

Getting the big picture with Log Analysis

How to get the most out of your logs!

Read
Jayesh Bapu Ahire

Jayesh Bapu Ahire

Microservices - Tracking Dependencies

Microservices - Tracking Dependencies

Quick primer into microservices architecture and the importance of tracking dependencies

Read
Akshay Chugh

Akshay Chugh

Jayesh Bapu Ahire

Jayesh Bapu Ahire

Infrastructure-As-Code-As-Software

Infrastructure-As-Code-As-Software

Explore how Infrastructure-as-Code-as-Software combines coding practices with automation to streamline infrastructure management and enhance scalability.

Read
Piyush Verma

Piyush Verma