All Authors / Piyush Verma
Software Monitoring — Stuck in the 00s
A short history of software monitoring, from the 00s. What has changed? Why are things so arcane?
Piyush Verma
How we tame High Cardinality by Sharding a stream
Using 'Sharding' to tame High Cardinality data for Levitate - Our Time Series Data Warehouse
Piyush Verma
How we tame high cardinality in time series databases
Engineering innovation to solve high cardinality with Levitate - a multi-part series
Piyush Verma, Swati Modi
Who should define Reliability — Engineering, or Product?
Whoever owns Reliability should define its parameters. But who owns the Reliability of a Product? Engineering? Product Management? Or the Customer success team?
Piyush Verma
High Cardinality? No Problem! Stream Aggregation FTW
Managing high cardinality in time series data is tough but crucial. Learn how Levitate’s streaming aggregations can help tackle it efficiently.
Piyush Verma
When should I start thinking of observability?
How does one scale metrics maturity in a cloud-native world — A guide on observability tooling as your engineering org scales.
Piyush Verma
Sample vs Metrics vs Cardinality
When dealing with Time Series databases, I always got confused with Sample vs Metrics vs Cardinality. Here’s an explanation as I have understood it.
Piyush Verma
Why Service Level Objectives?
Understanding how to measure the health of your servcie, benefits of using SLOs, how to set compliances and much more...
Piyush Verma
The origin of Service Level Objectives
Service Level Objectives (SLOs) dominate the software industry, but where did they come from?
Akshay Chugh, Piyush Verma
Doing SRE the Right Way!
A well-thought-out approach to SRE, which will help site reliability engineers and software engineers develop and maintain a useful, consistent, and effective SRE strategy for their products!
Piyush Verma
SLOs eased
You can either love running or hate running, but you will definitely love this analogy - take a fresh look at SLOs!
Piyush Verma, Saurabh Hirani
Latency SLO
How do you set latency-based alerts? A common approach is 95% of requests completed in 350ms, but is it really that simple?
Piyush Verma
Services; not Server
Gone are the days of yore when we named are our servers Etsy, Betsy, and Momo, fed them fish, and cleaned their poop.
Nishant Modak, Piyush Verma
Systems Observability
Observability is not just about being able to ask questions to your systems. It's also about getting those answers in minutes and not hours.
Nishant Modak, Piyush Verma
Much That We Have Gotten Wrong About SRE
An illustrated summary of Developers ➡ DevOps ➡ SRE
Piyush Verma
Infrastructure-As-Code-As-Software
We ran a poll on Twitter. “Do you care about the quality of your infrastructure code?” And on Reddit That’s an approximate and staggering 60–30–10 split. What do you think will the response be if the poll was — “Do you care about the quality of your product code?” Reasons We asked a follow-up question to reason why ~30% are in the Somewhat but mostly no category and gleaned these reasons from Twitter and Reddit: 1. Someone manually created the legacy infrastructure. No one questioned th
Piyush Verma
SLOs That Lie
Understanding how SLOs can help improve your performance and How to set the right Service Level Objectives for your application
Piyush Verma
Latency Percentiles are Incorrect P99 of the Times
What are P90, P95, and P99 latency? Why are they incorrect P99 of the times? Latency is for a unit of time and the preferred aggregate is percentile.
Piyush Verma
SRE Tooling – the Clever Hans fallacy
Chef or Ansible? Terraform or Pulumi? Python or Ruby? Last9 or Last9? Discover how building new tools links to the tale of a horse that could do math!
Piyush Verma
Root Cause Analysis For Reliability: A Case Study
Let's explore the importance of RCAs in Site Reliability Engineering, why use RCAs, and our take on what constitutes a “good” RCA.
Piyush Verma