Blog
Stories, guides, and lessons from the world of observability
Follow us on X
MTTF vs MTBF vs MTTD vs MTTR
This article covers questions such as what are MTTF, MTBF, MTTD, and MTTR, their differences, how to adopt them, and their use cases.


The neglected tech arctic winter — Internal SaaS expenses
The current tech winter reveals a hard truth: spending on internal tools for tech infrastructure is bloated—and this isn't just a passing cycle.


Recap of SRECon Americas 2023
SRECon is a conference hosted by USENIX and is focused on site reliability, distributed systems, and systems engineering at scale. A Recap of SRECon Americas 2023.


Understanding “Cricket Scale”
How does a DevOps/Site Reliability Engineer plan for "Cricket scale"? How do you warm systems' about to witness 30+ million concurrent users?


What is MTBI?
Everything you need to know about Mean Time Between Incidents (MTBI) and how it can help Site Reliability Engineers


Reliability Engineering for Dummies: ELI5
Explaining Reliability Engineering to a 5-year-old.


SLA vs SLO vs SLI - What's the difference
SLAs, SLOs, and SLIs—what’s the difference? For DevOps folks, understanding these nuances is key. Here's a quick guide to each term.


Rethinking Anomaly Detection: Focus on business outcomes
From the trenches at Games24x7 — Sanjay, on how Reliability engineering should drive core business metrics


Interesting talks on Observability from Fosdem 2023
A recap of the talks from the Observability and Monitoring dev room at Fosdem 2023.
