Blog
Stories, guides, and lessons from the world of observability
Follow us on X
Infrastructure-As-Code-As-Software
Explore how Infrastructure-as-Code-as-Software combines coding practices with automation to streamline infrastructure management and enhance scalability.

SLOs That Lie
Understanding how SLOs can help improve your performance and How to set the right Service Level Objectives for your application

Latency Percentiles are Incorrect P99 of the Times
What are P90, P95, and P99 latency? Why are they incorrect P99 of the times? Latency is for a unit of time and the preferred aggregate is percentile.

SRE Tooling – the Clever Hans fallacy
Chef or Ansible? Terraform or Pulumi? Python or Ruby? Last9 or Last9? Discover how building new tools links to the tale of a horse that could do math!

Root Cause Analysis For Reliability: A Case Study
Let's explore the importance of RCAs in Site Reliability Engineering, why use RCAs, and our take on what constitutes a “good” RCA.