Learnings integrating jmxtrans
JMX metrics give solid insights into the workings of your application. Integrating them with Levitate (our time series data warehosue) required us to jump some hoops with vmagent.
A practical guide for implementing SLO
How to set Service Level Objectives with 3 steps guide
Prathamesh Sonpatki, Saurabh Hirani
The importance of structured communication in the world of SRE
How you communicate helps build your 9s. In the world of Site Reliability Engineering, this is crucial. How do you do it?
How to improve Prometheus remote write performance at scale
Deep dive into how to improve the performance of Prometheus Remote Write at Scale based on real-life experiences
How to calculate HTTP content-length metrics on cli
A simple guide to crunch numbers for understanding overall HTTP content length metrics.
You can either love running or hate running, but you will definitely love this analogy - take a fresh look at SLOs!
Piyush Verma, Saurabh Hirani
AWS security groups: canned answers and exploratory questions
While using a Terraform lifecycle rule, what do you do when you get a canned response from a security group?
If it ain't broke...
A Terraform lifecycle rule in the right place can help prevent a deadlock. But the same lifecycle rule in the wrong place?
mv aws-security-group shoot-foot
How you can run into an unplanned downtime while making a seemingly harmless change of renaming an AWS security group through Terraform?