Piyush Verma gave a talk at SRE Day 2023the on Unwiring High Cardinality. The conference was held in London on September 14-15.
The conference included talks on various talks, including real-time stream processing, running the SRE team's incident management, and how Thanos proved costly for specific organizations.
Here is the outline of the talk.
Observability relies on metrics as a crucial aspect, providing a cost-effective and speedy way to address SDLC and Software health queries, which can otherwise be challenging.
With metrics, inevitably, you hit High Cardinality problems. While searching for profound insights from the systems, we often face restrictions due to the cardinality limitations of the observability tools. But what makes high cardinality significant, and why is it an inevitable challenge when monitoring systems on a vast scale?
Piyush delved into the anatomy of a metric and issues that high cardinality can help resolve, from combating Noisy Neighbors to battling in the Streaming Wars and dealing with the pulse of High Cardinality.
However, modern systems' limitations make cardinality an unsolved problem. To find the best solution for cardinality, it is crucial to understand the Metric Lifecycle. Lastly, Piyush defined the workflows that enable scaling cardinality to millions, not just thousands.
When software is in production, it's crucial to have telemetry and instrumentation to troubleshoot issues. Unfortunately, this can be a time-consuming and costly process. Often, we resort to using generic solutions that may not address all the unique needs of our specific system. This can lead to missed opportunities for improvement and wasted time looking for answers elsewhere. Most importantly, it will allow architects and engineering leaders to keep things SIMPLE and reach that 9 with much less pain.https://last9.io/blog/high-cardinality-no-problem-stream-aggregation-ftw/
Here is the YouTube video of the talk.
Here are blog posts showcasing how Levitate is built to tackle high cardinality.