All articles on Sre ⏤ Last9

systemctl logs: A Guide to Managing Logs in Linux

Learn how to manage and view systemctl logs in Linux with this guide, covering essential commands and best practices for troubleshooting.

Read

Faiz Shaikh

A Guide to Database Optimization for High Traffic

Learn how to optimize your database for high traffic, ensuring performance, scalability, and reliability under heavy load.

Read

Prathamesh Sonpatki

SRECon EMEA 2024 - Day 1

Here’s a quick rundown of the standout talks, big ideas, and memorable moments that kicked things off in SRECon EMEA Dublin 2024!

Read

Prathamesh Sonpatki

How We Cut Monitoring Costs and Deprecated Thanos at Replit

Winning Replit over by taming High Cardinality data and deprecating Thanos

Read

Prathamesh Sonpatki

Cricket Scale e01 — Ashutosh Agrawal

Unpacking "Cricket Scale" with the person behind the scenes at JioCinema

Read

Prathamesh Sonpatki

MTTF vs MTBF vs MTTD vs MTTR

This article covers questions such as what are MTTF, MTBF, MTTD, and MTTR, their differences, how to adopt them, and their use cases.

Read

Last9

Recap of SRECon Americas 2023

SRECon is a conference hosted by USENIX and is focused on site reliability, distributed systems, and systems engineering at scale. A Recap of SRECon Americas 2023.

Read

Last9

Introducing Levitate: Uplift Your Metrics Management

Managing time series databases is hard. We've evolved to services, yet monitoring lags. Our solution powers critical workloads at a lower cost.

Read

Nishant Modak

The importance of structured communication in the world of SRE

How you communicate helps build your 9s. In the world of Site Reliability Engineering, this is crucial. How do you do it?

Read

Saurabh Hirani

Thanos vs Cortex

In-depth comparison of Cortex and Thanos, what specifically they help teams do, challenges in implementing both, and how to think about what’s right for your team.

Read

Sahil Khan

Static Threshold vs. Dynamic Threshold Alerting

What's the difference between Static Threshold vs Dynamic Threshold Alerting? Do you really know when and how to use each threshold type?

Read

Last9

Sample vs Metrics vs Cardinality

When dealing with Time Series databases, I always got confused with Sample vs Metrics vs Cardinality. Here’s an explanation as I have understood it.

Read

Piyush Verma

Why Service Level Objectives?

Understanding how to measure the health of your servcie, benefits of using SLOs, how to set compliances and much more...

Read

Piyush Verma

Best Practices for Postmortems: A guide

The ins and outs of conducting an effective postmortem. Ready templates and examples from leading organizations around the world!

Read

Prathamesh Sonpatki

Choosing Effective SLIs

Practical advice to choose an effective SLI.

Read

Akshay Chugh

The origin of Service Level Objectives

Service Level Objectives (SLOs) dominate the software industry, but where did they come from?

Read

Akshay Chugh

Piyush Verma

Running a Database on EC2 is Slowing It Down

Learn everything about the advantages of EC2, it's use cases and how to optimize EC2 further.

Read

Jayesh Bapu Ahire

Akshay Chugh

Deployment Readiness Checklists

A ready checklist of a comprehensive list of steps and activities involved in the deployment of your application.

Read

Prathamesh Sonpatki

The most interesting talks from SRECon 2021!

SRECon, hosted by USENIX, focuses on site reliability and systems engineering at scale. Discover highlights from the most interesting talks at SRECon 2021.

Read

Akshay Chugh

Doing SRE the Right Way!

A well-thought-out approach to SRE, which will help site reliability engineers and software engineers develop and maintain a useful, consistent, and effective SRE strategy for their products!

Read

Piyush Verma