🏏 450 million fans watched the last IPL. What is 'Cricket Scale' for SREs? Know More

Jul 5th, ‘24/4 min read

The most important aspect of software monitoring

Ths single most important thing to get better at your software monitoring journey

The most important aspect of software monitoring

Most folks i speak to attest to culture and collaboration as the key ingredients to successful monitoring. Then comes the right tools. Can a product solve for High Cardinality, for example? Is it transparent with regard to pricing etc… But culture and collaboration are the key ingredients.

There are about half a dozen things that I can think of when it comes to culture and collaboration. But, one point stands out. It’s the most vital of all, and a good chunk of folks in junior roles articulate this immediately because they’re at the forefront of the pain associated with software monitoring.

Instrumentation.

Good instrumentation solves for collaboration in culture, because it improves all tenants of how a system is being observed. Instrumentation is the process of adding the necessary code and tools to capture detailed information about an application's behavior. This data is essential for understanding, diagnosing, and optimizing an application’s performance and reliability.

Why is instrumentation so critical

Instrumentation is the first line of defense to provide insights into how an application is functioning. If architected well, it highlights areas of high latency, error-prone code, and resource consumption.

Detailed instrumentation helps in pinpointing the exact location and context of issues, making it easier to debug and resolve problems quickly.

My colleague wrote a piece around a tool we recently built — Auto-discovery.

tl;dr: As a managed SaaS partner, we aid in reducing costs for our customers. This is also a Key Responsibility Area for our customer-facing engineers in the org. Auto-discovery helps solve the problem of monitoring coverage. Auto-discovery has a framework to discover all details of any component, or its instances running in a system with little to no human intervention, making it effective and resilient.

Tools like these solve instrumentation at its very core, fostering collaboration and improving culture across multiple teams, not just engineering.

What is Currently Missing in instrumentation

Despite its importance, several common issues and gaps exist in current instrumentation practices:

  1. Incomplete coverage: Many applications lack comprehensive instrumentation, leading to blind spots where issues can go undetected. This hurts the most when you least expect it. Also, one of the reasons why we built auto-discovery was after noticing the pain some of our customers endured at critical junctures.
  2. Manual effort and inconsistency: Instrumentation often requires significant manual effort, resulting in inconsistencies and potential omissions. Humans are…. humans; prone to error. Automating all manual effort is key to better instrumentation.
  3. Lack of Standardization: Different parts of the application might use varied approaches and tools for instrumentation, making it hard to integrate and correlate data. This is an organization culture problem; one that gets nipped when you have ruthless leadership that prioritizes best practices.
  4. Contextual Data Absence: Instrumentation might lack contextual information, making it hard to correlate metrics, logs, and traces across different components. This is a function of poor instrumentation at the kick-off stage.

Getting Better at Instrumentation

To improve instrumentation practices, here are some key points:

  1. Adopt Automatic Instrumentation Tools:
    Use frameworks and libraries that provide automatic instrumentation capabilities. For example, OpenTelemetry offers automatic instrumentation for various programming languages and frameworks.
    These tools reduce the manual effort required and ensure more consistent and comprehensive coverage.
  2. Standardize Instrumentation Practices:
    Establish and enforce standards for how instrumentation should be implemented across the organization. Use consistent naming conventions, metrics, and logging formats to facilitate easier data aggregation and analysis.
  3. Lightweight Performance:
    Use lightweight instrumentation techniques to minimize performance overhead. Regularly review and optimize instrumentation code to ensure it doesn't negatively impact the application's performance. This comes with teams dedicated to doing the boring brunt of work, but immensely helpful in the org’s larger context.
  4. Continuous Monitoring and Improvement:
    Treat instrumentation as an ongoing process rather than a one-time task. All software monitoring is a journey without a destination. You simply have to regularly audit and review instrumentation coverage to identify and address gaps.
    Or, outsource this headache to folks dedicated to making this work — someone like… well… me 😛
  5. Training and Education:
    This is hard when you’re a startup, but there’s no excuse for enterprises not investing in Learning and Development. For most aspiring DevOps folks i talk to, i notice how there’s not enough emphasis to learn and invest time to form opinions and frameworks around observability.

I also want to explicitly call this out: automatic instrumentation comes at a price. You may end up with too much telemetry. Beyond a certain org size, using automatic instrumentation should be a conscious choice because you do not want to trade the ease of instrumentation with the excess data and cost it comes with. This is precisely why you need experts who obsess over this problem.

This is also why adopting a solution is a deterrent for large orgs. However, if you auto-instrument when you are a small org, you will never have a manual override because no one instruments twice 😉.

Engineering is about trade-offs. And everything has a price ultimately, if not immediately. My advice: Talk to the experts, and never manage your own monitoring; it’s a hassle that distracts you from building core products.


Feel free to chat with us on our Discord or reach out to me if you want to discuss DevOps/SRE.

You can also book a demo with us to understand Last9, or even give us feedback, suggestions et al. ✌️

Contents


Newsletter

Stay updated on the latest from Last9.

Authors

Aniket Rao

http://1x.engineer @last9io 💻 Programmer | 🌌 Astrophile | 🎮 FIFA Player |🏌️‍♂️Amateur Golfer

Handcrafted Related Posts