All Topics / Tools
What do self-driving cars tell us about Site Reliability Engineering?
From Robocars to Reliability — SRE with self-driving cars; mapping out where the Observability space is in conjunction with self-driving cars
Mohan Dutt Parashar
Understanding “Cricket Scale”
How does a DevOps/Site Reliability Engineer plan for "Cricket scale"? How do you warm systems' about to witness 30+ million concurrent users?
Reliability Engineering for Dummies: ELI5
Explaining Reliability Engineering to a 5-year-old.
Mohan Dutt Parashar
Self-managed Prometheus vs Managed Prometheus
What are the differences between Self-managed Prometheus vs Managed prometheus? How do you choose what works for you?
Battling Alert Fatigue
What is Alert Fatigue and techniques to reduce it
How to calculate HTTP content-length metrics on cli
A simple guide to crunch numbers for understanding overall HTTP content length metrics.
We’ve raised a $11M Series A led by Sequoia Capital India!
Change is the only constant in a cloud environment. The number of microservices is constantly growing, and each is being deployed several times a day or week, all hosted on ephemeral servers. A typical customer request depends on at least three internal and one external service. It’s a densely connected web of systems. Any change in such a connected system usually introduces a ripple. It’s tough to understand these impacts. Alert fatigue, tribal knowledge of failures, and manual correlation acro
The origin of Service Level Objectives
An obscure term - Service Level Objectives - rules the Software industry. But where does it come from? Strap on your seat belts, this is going to be a bumpy one (pun intended :p)
Akshay Chugh, Piyush Verma
Strace – A Hidden Superpower
As with any operating system, it’s not uncommon to encounter issues while running Linux and associated applications. This is especially true while using closed-source programs since granular code inspection isn’t possible.
Akshat Goyal, Prathamesh Sonpatki
We ran a poll on Twitter. “Do you care about the quality of your infrastructure code?” And on Reddit That’s an approximate and staggering 60–30–10 split. What do you think will the response be if the poll was — “Do you care about the quality of your product code?” Reasons We asked a follow-up question to reason why ~30% are in the Somewhat but mostly no category and gleaned these reasons from Twitter and Reddit: 1. Someone manually created the legacy infrastructure. No one questioned t