Reliability illustration

Reliability

All articles tagged 'Reliability'

Reliability vs Availability: A Simple Breakdown

Reliability vs Availability: A Simple Breakdown

Reliability and availability are crucial concepts in DevOps. Here's a simple breakdown to help you understand their key differences and importance.

Read
Anjali Udasi

Anjali Udasi

Cricket Scale e01 — Ashutosh Agrawal

Cricket Scale e01 — Ashutosh Agrawal

Unpacking "Cricket Scale" with the person behind the scenes at JioCinema

Read
Prathamesh Sonpatki

Prathamesh Sonpatki

Radar and Black Box for Software Observability

Software Observability from the Lens of Radar and a Black Box

Observability is often a misunderstood and misused term.  It has come to mean nothing and everything at this point. Read more on how Observability can be viewed from the lens of a Radar and a Black Box.

Read
Nishant Modak

Nishant Modak

A case for Observability outside engineering teams

A case for Observability outside engineering teams

Observability is being built by engineers for engineers. In reality, o11y is for all.

Read
Aniket Rao

Aniket Rao

How we tame High Cardinality by Sharding a stream

How we tame High Cardinality by Sharding a stream

Using 'Sharding' to tame High Cardinality data for Levitate - Our Time Series Data Warehouse

Read
Piyush Verma

Piyush Verma

MTTF vs MTBF vs MTTD vs MTTR

MTTF vs MTBF vs MTTD vs MTTR

This article covers questions such as what are MTTF, MTBF, MTTD, and MTTR, their differences, how to adopt them, and their use cases.

Read
Last9

Last9

The neglected tech arctic winter — Internal SaaS expenses

The neglected tech arctic winter — Internal SaaS expenses

The current tech winter reveals a hard truth: spending on internal tools for tech infrastructure is bloated—and this isn't just a passing cycle.

Read
Nishant Modak

Nishant Modak

Introducing Levitate: Uplift Your Metrics Management

Introducing Levitate: Uplift Your Metrics Management

Managing time series databases is hard. We've evolved to services, yet monitoring lags. Our solution powers critical workloads at a lower cost.

Read
Nishant Modak

Nishant Modak

The difference between DevOps, SRE, and Platform Engineering

The difference between DevOps, SRE, and Platform Engineering

In reliability engineering, three concepts keep getting talked about - DevOps, SRE and Platform Engineering. How do they differ?

Read
Prathamesh Sonpatki

Prathamesh Sonpatki

India vs Pakistan: SRE and the Shannon Limit

India vs Pakistan: SRE and the Shannon Limit

How does one ‘detect change’ in a complex infrastructure, so you don’t lose out on critical revenues — A short SRE story

Read
Satyajeet Jadhav

Satyajeet Jadhav

Battling Alert Fatigue

Battling Alert Fatigue

What is Alert Fatigue and techniques to reduce it

Read
Last9

Last9

Why MTTR should be a ‘business’ metric

Why MTTR should be a ‘business’ metric

A key challenge is aligning engineering health metrics with business goals. How can business measure engineering, and engineering show its value?

Read
Sidu Ponnappa

Sidu Ponnappa

Sample vs Metrics vs Cardinality

Sample vs Metrics vs Cardinality

When dealing with Time Series databases, I always got confused with Sample vs Metrics vs Cardinality. Here’s an explanation as I have understood it.

Read
Piyush Verma

Piyush Verma

Reliability Tools

Reliability Tools

A guide through the most popular DevOps and SRE tools for building your reliability stack.

Read
Abhi Puranam

Abhi Puranam

Best Practices for Postmortems: A guide

Best Practices for Postmortems: A guide

The ins and outs of conducting an effective postmortem. Ready templates and examples from leading organizations around the world!

Read
Prathamesh Sonpatki

Prathamesh Sonpatki

Choosing Effective SLIs

Choosing Effective SLIs

Practical advice to choose an effective SLI.

Read
Akshay Chugh

Akshay Chugh

Deployment Readiness Checklists

Deployment Readiness Checklists

A ready checklist of a comprehensive list of steps and activities involved in the deployment of your application.

Read
Prathamesh Sonpatki

Prathamesh Sonpatki

The most interesting talks from SRECon 2021!

The most interesting talks from SRECon 2021!

SRECon, hosted by USENIX, focuses on site reliability and systems engineering at scale. Discover highlights from the most interesting talks at SRECon 2021.

Read
Akshay Chugh

Akshay Chugh

Doing SRE the Right Way!

Doing SRE the Right Way!

A well-thought-out approach to SRE, which will help site reliability engineers and software engineers develop and maintain a useful, consistent, and effective SRE strategy for their products!

Read
Piyush Verma

Piyush Verma

Getting the big picture with Log Analysis

Getting the big picture with Log Analysis

How to get the most out of your logs!

Read
Jayesh Bapu Ahire

Jayesh Bapu Ahire

Microservices - Tracking Dependencies

Microservices - Tracking Dependencies

Quick primer into microservices architecture and the importance of tracking dependencies

Read
Akshay Chugh

Akshay Chugh

Jayesh Bapu Ahire

Jayesh Bapu Ahire

Components in Designing Effective SLOs

Components in Designing Effective SLOs

A primer on how to design and implement effective Serice Level Objectives(SLOs)

Read
Akshat Goyal

Akshat Goyal

Sleep Friendly Alerting

Sleep Friendly Alerting

We've all been woken up with that dreaded Slack notification at ungodly hours only to realise that the alert was all smoke and no fire. The perfect recipe for dread and alert fatigue.

Read
Akshat Goyal

Akshat Goyal