Guide — Instrumentation: Getting Signals In

In the first part of this series, we covered what OpenTelemetry is and how it helps connect the dots in modern distributed systems. This piece will focus on instrumentation in OpenTelemetry — auto vs manual, and when to use each approach.

When something breaks in production, like a sudden spike in checkout latency, the root cause often isn’t obvious. Metrics might look fine. Logs don’t tell the full story. And tracing the request path across services? That’s usually where it falls apart.

Without clear visibility into how requests move through your system, you’re left guessing. This is where instrumentation plays a critical role.

What is Instrumentation?

Instrumentation is the backbone of observability—it’s what lets your app send out telemetry data like logs, metrics, and traces so you can understand what’s happening inside.

A simple way to think about instrumentation is adding observability hooks into your code. These hooks generate data points that help you track requests, measure how long things take, and catch failures with the right context.

In OpenTelemetry, instrumentation produces three key types of telemetry:

Traces — Follow a request as it moves across services, showing where time is spent or where things break.
Metrics — Track numbers like request rates, latency, or memory usage to measure performance.
Logs — Record events with context that come in handy during debugging.

This data comes from instrumentation libraries and agents. Some automatically attach to popular frameworks and libraries, while others require you to add specific code.

There are two main ways to instrument with OpenTelemetry:

Auto-instrumentation — Automatically hooks into your app’s frameworks and libraries with little or no code changes.
Manual instrumentation — Lets you decide exactly what to measure and where, by using OpenTelemetry’s SDKs.

Language Support in OpenTelemetry

Instrumentation looks different depending on your codebase, and OpenTelemetry supports a wide range of languages. Each library follows the same basic ideas but is built to fit naturally with its language.

Here’s the current list of supported languages:

Java
Python
Go
JavaScript / Node.js
.NET
Ruby
PHP
Rust
Swift
Erlang / Elixir
C++

This approach makes it easier to collect and work with telemetry data across services, even if they use different languages.

Getting Started with Auto Instrumentation in Opentelemetry

Once you’ve picked the right OpenTelemetry library for your language, the next step is setting up instrumentation—how your app will emit telemetry data. Auto instrumentation is often the easiest way to begin because it hooks into common frameworks and libraries automatically, requiring little to no change in your code.

Take Java as an example. You can add the OpenTelemetry Java agent at startup like this:

java -javaagent:opentelemetry-javaagent.jar -jar your-app.jar

That single step instruments many popular libraries out of the box. This includes web frameworks like Spring MVC and JAX-RS, HTTP clients such as OkHttp and gRPC, databases like JDBC and MongoDB, messaging systems like Kafka and RabbitMQ, and cloud SDKs like AWS and Google Cloud.

With no code changes, you’ll start seeing telemetry like:

Request and error rates across endpoints
Database query performance
Latency of external API calls
Time spent in each layer of a request flow

Other languages offer similar setups with plugins for popular frameworks. For example:

Python Flask:

pip install opentelemetry-instrumentation-flask

Node.js Express:

npm install @opentelemetry/instrumentation-express

Ruby on Rails:

gem install opentelemetry-instrumentation-rails

How It Works Behind the Scenes

OpenTelemetry agents use different techniques depending on the language:

In Java, they modify compiled classes at runtime using bytecode instrumentation to insert tracing logic.
In Python and Node.js, they use monkey patching — wrapping existing functions so they can capture telemetry during execution.

That said, auto-instrumentation isn’t without its quirks. Each language brings its edge cases and trade-offs:

Language	Instrumentation Method	Common Pitfalls
Java	Bytecode instrumentation	Can conflict with other agents like New Relic or AppDynamics, sometimes leading to odd behavior.
Python	Monkey patching	May break during hot reloads — for example, when running Flask in development mode.
Node.js	Monkey patching	Doesn’t always behave well with ES Modules and can lose async context in older runtimes.
Go	Beta-stage auto-instrumentation	Still evolving. Many teams stick to manual spans until support becomes more stable.

Easy-to-Miss Settings That Cause Big Issues

Once auto-instrumentation is set up, the next step is tuning it, and this part often gets overlooked. A few environment variables and config tweaks can make the difference between clean, usable telemetry and a firehose of noise (or worse, no data at all).

Here are some key settings that deserve your attention:

Sampling Rate

In high-throughput systems, collecting every single trace is a fast way to overwhelm your storage and slow down your observability stack. OpenTelemetry lets you define how much data to keep using sampling. For example:

OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1

This keeps 10% of the traces—enough to spot trends and troubleshoot, without flooding your backend.

Export Endpoint

Telemetry data doesn’t magically arrive at your backend. You need to tell OpenTelemetry where to send it. If this is misconfigured (or missing), your data just vanishes into the void.

OTEL_EXPORTER_OTLP_ENDPOINT=http://collector:4317

Set it right so your traces, metrics, and logs go to the collector, backend, or observability tool you’re using.

Service Name

Without a proper OTEL_SERVICE_NAME, your telemetry data is just noise. It’s the label that ties traces and metrics to a specific service, critical for filtering, debugging, and understanding what’s going on across your system.

OTEL_SERVICE_NAME=payment-service

Make it meaningful, and keep it consistent.

Attribute Filtering

Telemetry attributes add context, but too many of them can blow up your storage bill and create high-cardinality issues. Filtering early lets you control the signal-to-noise ratio before the data even leaves your app.

Getting these settings right helps you avoid silent failures, keep costs in check, and ensure your instrumentation is useful.

Benefits of Auto Instrumentation

Here are some key benefits:

No code changes: Instrumentation happens behind the scenes, so you avoid risky or time-consuming code updates.
Covers popular libraries by default: Common frameworks like Spring, Flask, Express, JDBC, Kafka, and gRPC are supported out of the box, giving you instant traces and metrics.
Works across environments: Whether you run locally, in containers, or on Kubernetes, auto instrumentation fits in smoothly. For Kubernetes, the OpenTelemetry Operator can automatically inject agents for you.
Quick visibility: Within minutes, you get insights into request paths, latencies, error rates, and dependencies—helping you build a baseline for system health.

This approach lets you get monitoring up fast while still covering the most important parts of your application.

When Auto Instrumentation Isn’t Enough

Auto instrumentation sometimes doesn’t tell the whole story.

Imagine your /checkout endpoint is running slow. Auto instrumentation can show you the latency spike and which services were involved. But it won’t reveal who the user was, what they were trying to buy, or why that particular request mattered.

You might ask questions like:

Was the slowdown affecting premium customers only?
Was it tied to high-value orders?
Did a specific product or coupon trigger the issue?

These business-context questions that auto instrumentation can’t answer on its own.

To fill those gaps, you can:

Add manual instrumentation in high-value paths. Capture attributes like user.tier, order.value, or product.id directly in your code, close to where those values are known.
Use context propagation and span enrichment to carry that business metadata across services. You can do this in your app code or enrich spans downstream using the OpenTelemetry Collector by pulling data from external systems like user or billing services.
Correlate with logs. Include trace IDs and key business attributes in your logs so you can link traces and logs during debugging.

This is where manual instrumentation shines—it gives you the flexibility to add the detail that helps you understand not just what broke, but why.

Manual Instrumentation: Adding Context Where It Matters

Auto instrumentation is useful for quick setup, but it only covers what libraries expose by default. When you need to track details specific to your business, manual instrumentation helps fill in the gaps.

With OpenTelemetry SDKs, you add spans, metrics, and logs directly in your code. This gives you full control over what data gets collected and where.

For example, you can:

Add business details like customer tier, subscription type, or cart size to traces and logs.
Measure how long important processes take, such as pricing calculations or fraud checks.
Include extra error details—what was happening, which user was affected, and what input was involved.
Follow how users move through your app, seeing what features they use and where they drop off.

Here’s a simple example of a checkout trace with added business info:

[Trace] /api/checkout
├── [Span] validate_user_session (22ms)
│   └── user.tier = "premium"
├── [Span] validate_coupon (45ms)
│   ├── coupon.code = "SUMMER20"
│   ├── coupon.valid = true
│   └── coupon.discount_amount = 25.50
├── [Span] process_payment (112ms)
│   ├── payment.method = "credit_card"
│   ├── payment.amount = 127.50
│   └── payment.success = true
└── [Span] create_order (89ms)
    ├── order.id = "ORD-12345"
    └── order.items_count = 3

This detail helps your team understand not just what failed, but why it happened and who was affected.

Where Manual Instrumentation Works Best

Manual instrumentation is most helpful when you want to:

Track important flows like onboarding, checkout, or retry attempts
Capture key details like user roles, order amounts, or subscription types
Bring together logs, traces, and metrics so debugging makes sense

It takes a bit more work, but it helps your telemetry match how your product works and how users experience it.

Why Use Both Auto and Manual Instrumentation?

Auto and manual instrumentation each have their strengths. Using both gives you a fuller, clearer picture of what’s going on.

Auto instrumentation gets you broad, system-level visibility fast. It captures things like HTTP requests, database calls, and outbound API traffic with minimal setup.

Manual instrumentation fills in the gaps—tracking your business logic, custom workflows, and user-specific details that auto can’t see.

Here’s a quick guide to help you decide when to use which:

When you want to…	Use Auto Instrumentation	Use Manual Instrumentation
Get quick, system-wide visibility	✔️
Track standard things like HTTP or DB calls	✔️
Add business-specific info (user tiers, order amounts)		✔️
Capture custom application logic		✔️
Connect traces across services	✔️	✔️
Correlate logs with traces and add extra context		✔️

How to use this in practice:

Start with auto instrumentation to get an overview quickly. Then, identify important business flows or places where you need more detail. Add manual instrumentation there to capture the extra context that auto misses.

Keep manual instrumentation focused. Don’t try to tag everything—just what helps explain the why behind issues.

Example: Debugging a Booking Failure

Imagine a travel booking platform running into trouble. Auto instrumentation shows calls to the payment API are timing out. Manual instrumentation reveals it’s only happening for high-value, business-class bookings.

Put those pieces together, and you find the problem started after introducing a new payment processor. Plus, it happens right after their daily maintenance window.

That kind of insight only comes when you combine system-level telemetry with business context.

Where It All Comes Together

All telemetry—auto or manual—needs a destination. OpenTelemetry supports:

OTLP: The standard protocol for telemetry data
Third-party tools: Exporters for platforms like Last9, Jaeger, Prometheus, and more
File outputs: Useful for testing and local analysis

Most teams rely on the OpenTelemetry Collector. It receives telemetry, applies processing like filtering or enrichment, and forwards it to the right backend(s).

How to Approach Instrumentation Strategically

Collecting data with OpenTelemetry isn’t about tracking everything blindly—it’s about choosing what matters for your system and users. Here’s a practical way to do that:

Start with auto instrumentation
Enable OpenTelemetry agents or plugins for your language and framework. This gives you quick visibility into core parts like HTTP requests, database queries, and messaging, without changing your code.
Identify gaps in visibility
Look for areas where you can’t easily understand what’s going wrong during incidents. Focus on critical user flows like checkout, login, or onboarding that directly impact your business.
Add manual spans where necessary
Use OpenTelemetry SDKs to instrument your code for important business operations. For example, wrap coupon validation or payment processing with spans that include relevant details.
Standardize across teams
Agree on how to name spans, tag attributes, and decide what gets instrumented. This keeps your telemetry data consistent and easier to analyze.
Make instrumentation part of your development cycle
Include instrumentation reviews in code reviews, update it during refactors, and automate checks in your CI/CD pipelines.

How to Standardize and Manage Instrumentation Data

Good observability combines auto-instrumentation for broad coverage with manual instrumentation for the details. But managing all that data from different teams and services can get messy fast.

Last9 helps by bringing together OpenTelemetry traces, metrics, and logs into one clear, unified view. More importantly, it helps standardize your telemetry with features like Control Plane Remapping. This means consistent naming and tagging across your entire stack, no matter which team produced the data.

That kind of standardization makes your telemetry easier to search, filter, and analyze, saving you time and headaches when you’re troubleshooting or trying to spot patterns.

With Last9, instrumentation becomes less about juggling noise and more about understanding what’s happening in your system.

Next up, we’ll dig into OpenTelemetry collector — exploring its architecture, pipelines, extensions, processors, and everything in between.

Instrumentation: Getting Signals In

Contents