All Topics / SRE
SRE
Site Reliability Engineering related blogposts
![How we reduced monitoring costs and deprecated Thanos for Replit](https://last9.ghost.io/content/images/2024/06/blog-5.png)
How we reduced monitoring costs and deprecated Thanos for Replit
Winning Replit over by taming High Cardinality data and deprecating Thanos
Prathamesh Sonpatki
![Cricket Scale e01 — Ashutosh Agrawal](https://last9.ghost.io/content/images/2024/03/Nishant-Ashutosh-talk--blog-post.jpg)
Cricket Scale e01 — Ashutosh Agrawal
Unpacking "Cricket Scale" with the person behind the scenes at JioCinema
Prathamesh Sonpatki
![MTTF vs MTBF vs MTTD vs MTTR](https://last9.ghost.io/content/images/2023/04/MTTD-vs-MTTF-vs-MTBF-vs-MTTR-1.gif)
MTTF vs MTBF vs MTTD vs MTTR
This article covers questions such as what are MTTF, MTBF, MTTD, and MTTR, their differences, how to adopt them, and their use cases.
Last9
![Recap of SRECon Americas 2023](https://last9.ghost.io/content/images/2023/03/srecon-recap.jpeg)
Recap of SRECon Americas 2023
SRECon is a conference hosted by USENIX and is focused on site reliability, distributed systems, and systems engineering at scale. A Recap of SRECon Americas 2023.
Last9
![Introducing Levitate: ‘uplifting’ your metrics woes because self-management sucks like gravity](https://last9.ghost.io/content/images/2023/01/Glitch-creative-2-copy.jpg)
Introducing Levitate: ‘uplifting’ your metrics woes because self-management sucks like gravity
Managing your own time series database is painful. We’ve moved from servers to services, and yet, monitoring metrics data is primitive. Our managed time series database powers mission-critical workloads for monitoring, at a fraction of the cost.
Nishant Modak
![The importance of structured communication in the world of SRE](https://last9.ghost.io/content/images/2022/12/The-importance-of-structured-copy.jpg)
The importance of structured communication in the world of SRE
How you communicate helps build your 9s. In the world of Site Reliability Engineering, this is crucial. How do you do it?
Saurabh Hirani
![Thanos vs Cortex](https://last9.ghost.io/content/images/2022/12/Cortex-vs-Thanos.jpg)
Thanos vs Cortex
In-depth comparison of Cortex and Thanos, what specifically they help teams do, challenges in implementing both, and how to think about what’s right for your team.
Sahil Khan
![Static Threshold vs. Dynamic Threshold Alerting](https://last9.ghost.io/content/images/2022/10/Static-Threshold-vs.-Dynamic-Threshold-Alerting-copy.jpg)
Static Threshold vs. Dynamic Threshold Alerting
What's the difference between Static Threshold vs Dynamic Threshold Alerting? Do you really know when and how to use each threshold type?
Last9
![Sample vs Metrics vs Cardinality](https://last9.ghost.io/content/images/2022/08/Cube-Creative.jpg)
Sample vs Metrics vs Cardinality
When dealing with Time Series databases, I always got confused with Sample vs Metrics vs Cardinality. Here’s an explanation as I have understood it.
Piyush Verma
Why Service Level Objectives?
Understanding how to measure the health of your servcie, benefits of using SLOs, how to set compliances and much more...
Piyush Verma
Best Practices for Postmortems: A guide
The ins and outs of conducting an effective postmortem. Ready templates and examples from leading organizations around the world!
Prathamesh Sonpatki
![Choosing Effective SLIs](https://last9.ghost.io/content/images/2023/06/photo-1604582703892-ae39b465fedc-_1_-_1_.webp)
Choosing Effective SLIs
Practical advice to choose an effective SLI.
Akshay Chugh
![The origin of Service Level Objectives](https://last9.ghost.io/content/images/2022/12/photo-1465447142348-e9952c393450--1-.jpeg)
The origin of Service Level Objectives
An obscure term - Service Level Objectives - rules the Software industry. But where does it come from? Strap on your seat belts, this is going to be a bumpy one (pun intended :p)
Akshay Chugh, Piyush Verma
Running a Database on EC2 is Slowing It Down
Learn everything about the advantages of EC2, it's use cases and how to optimize EC2 further.
Jayesh Bapu Ahire, Akshay Chugh
Deployment Readiness Checklists
A ready checklist of a comprehensive list of steps and activities involved in the deployment of your application.
Prathamesh Sonpatki
The most interesting talks from SRECon 2021!
SRECon is a conference hosted by USENIX and is focused on site reliability, distributed systems, and systems engineering at scale. Learn about some of the most interesting talks from SRECon 2021.
Akshay Chugh
Doing SRE the Right Way!
A well-thought-out approach to SRE, which will help site reliability engineers and software engineers develop and maintain a useful, consistent, and effective SRE strategy for their products!
Piyush Verma
![Microservices - Tracking Dependencies](https://last9.ghost.io/content/images/2022/12/photo-1621906066952-580204e716a0-3.jpg)
Microservices - Tracking Dependencies
Quick primer into microservices architecture and the importance of tracking dependencies
Akshay Chugh, Jayesh Bapu Ahire
SLOs eased
You can either love running or hate running, but you will definitely love this analogy - take a fresh look at SLOs!
Piyush Verma, Saurabh Hirani
![AWS security groups: canned answers and exploratory questions](https://last9.ghost.io/content/images/2021/11/joshua-earle-C6duwascOEA-unsplash.jpg)
AWS security groups: canned answers and exploratory questions
While using a Terraform lifecycle rule, what do you do when you get a canned response from a security group?
Saurabh Hirani
![If it ain't broke...](https://last9.ghost.io/content/images/2021/11/eberhard-grossgasteiger-xC7Ho08RYF4-unsplash-1.jpg)
If it ain't broke...
A Terraform lifecycle rule in the right place can help prevent a deadlock. But the same lifecycle rule in the wrong place?
Saurabh Hirani
![mv aws-security-group shoot-foot](https://last9.ghost.io/content/images/2021/11/cesar-couto-sKuVjm0xyLY-unsplash.jpg)
mv aws-security-group shoot-foot
How you can run into an unplanned downtime while making a seemingly harmless change of renaming an AWS security group through Terraform?
Saurabh Hirani
![Much That We Have Gotten Wrong About SRE](https://last9.ghost.io/content/images/2021/10/image-24.png)
Much That We Have Gotten Wrong About SRE
An illustrated summary of Developers ➡ DevOps ➡ SRE
Piyush Verma