Last9 named a Gartner Cool Vendor in AI for SRE Observability for 2025! Read more →
Last9

OpenTelemetry Agents: A Production Guide for Zero-Code Instrumentation

Discover how OpenTelemetry agents collect, process, and export telemetry data—plus how to set them up and avoid common pitfalls.

Feb 27th, ‘25
Everything You Need to Know About OpenTelemetry Agents
See How Last9 Works

Unified observability for all your telemetry. Open standards. Simple pricing.

Talk to an Expert

OpenTelemetry agents automatically instrument your application at runtime without requiring code changes — you attach them at startup, and they inject tracing, metrics, and logs using bytecode manipulation or eBPF.

If you're running production services and want observability without refactoring every microservice, agents are the fastest path. They hook into your runtime (JVM, .NET CLR, Python interpreter, Node.js V8) and intercept framework calls, database queries, HTTP requests, and more. The tradeoff is performance overhead and less control compared to manual SDK instrumentation.

Here's what you need to know: agents typically add 5-10% CPU overhead and 20-50MB memory depending on your language and traffic volume. For most teams, that's acceptable. If you're running latency-sensitive services at extreme scale, you'll want to benchmark first. But for legacy apps, third-party services, or rapid rollouts across dozens of microservices, agents are usually the right call.

This blog covers how agents work, when to use them vs the SDK, installation patterns for Java, .NET, Python, and Node.js, performance impact, common production issues, and how to route telemetry to backends that won't explode your costs.

How OpenTelemetry Agents Work

OpenTelemetry agents run inside your application process and modify your code at runtime. They don't change your source files — they intercept function calls and inject instrumentation logic dynamically.

The mechanics vary by language:

  • Java: Uses the -javaagent JVM flag to load bytecode transformers that rewrite classes as they're loaded. Hooks into Spring, Tomcat, JDBC, gRPC, etc.
  • .NET: Uses CLR profiling APIs (CORECLR_ENABLE_PROFILING) to inject IL instructions into methods at startup.
  • Python: Wraps framework entry points using Python's import hooks and function decorators. Works with Flask, Django, FastAPI, etc.
  • Node.js: Uses V8's --require flag to load instrumentation modules before your app starts. Intercepts HTTP, Express, database drivers.
  • Go: No traditional agent — Go's static compilation prevents runtime bytecode manipulation. Instead, you use auto-instrumentation libraries at compile time.

Once attached, the agent automatically creates spans for incoming requests, outgoing HTTP calls, database queries, and message queue operations. It exports telemetry via OTLP (OpenTelemetry Protocol) to your backend.

💡
Agents create spans automatically, but understanding how OpenTelemetry spans and events work helps you interpret the trace data you're collecting and know when to add manual instrumentation for business logic that agents can't capture.

What Gets Instrumented Automatically

Most agents cover:

  • HTTP servers and clients
  • Database drivers (PostgreSQL, MySQL, MongoDB, Redis)
  • gRPC and messaging systems (Kafka, RabbitMQ, SQS)
  • Framework-specific logic (Spring, Django, Express, ASP.NET Core)

The exact coverage depends on the language. Java has the most mature agent with 100+ supported libraries. Python and .NET are close behind. Node.js coverage is improving, but it still has gaps for newer frameworks.

If your app uses an unsupported library, the agent won't instrument it automatically. You'll need to add manual spans using the OpenTelemetry SDK or submit a contribution to the agent's instrumentation registry.

OpenTelemetry Agent vs SDK: Which Should You Use?

Use the agent if:

  • You need instrumentation across legacy apps with no time for code changes
  • You're rolling out observability to dozens of services quickly
  • Your team doesn't have the bandwidth to manually instrument every endpoint
  • You're okay with ~5-10% overhead and limited control over span semantics

Use the SDK if:

  • You're building a new service and want full control over what gets traced
  • You need custom attributes, span events, or fine-grained sampling logic
  • Your app is latency-sensitive, and you can't tolerate agent overhead
  • You want to control exactly which operations create spans

Here's a quick comparison:

Feature Agent SDK
Code changes required None Yes (manual instrumentation)
Performance overhead 5-10% CPU, 20-50MB memory <1% (depends on your code)
Framework support Automatic for popular frameworks Manual for everything
Custom attributes Limited (via env vars) Full control
Sampling control Basic (env vars, config) Advanced (custom samplers)
Debugging complexity Higher (agent internals are opaque) Lower (you control the code)

Example:

If you're instrumenting a legacy Java monolith running Spring Boot, use the agent. Retrofitting manual SDK calls into thousands of controllers and services isn't worth it.

If you're building a Go microservice from scratch and need to track specific business logic (like payment processing stages), use the SDK. Go doesn't have a runtime agent anyway, and you'll want control over span naming and attributes.

If you're running a polyglot system with services in Java, Python, Node.js, and .NET, start with agents for consistency. You can always mix in SDK instrumentation later for critical paths.

How to Install OpenTelemetry Agents in Production

Agent installation varies by language, but the pattern is always the same: download the agent artifact, pass it to the runtime at startup, and configure exporters via environment variables.

Java

Download the latest OpenTelemetry Java agent JAR from the GitHub releases page. Then add it to your JVM startup:

java -javaagent:/path/to/opentelemetry-javaagent.jar \
  -Dotel.service.name=my-service \
  -Dotel.exporter.otlp.endpoint=http://localhost:4318 \
  -jar my-application.jar

Key environment variables:

  • OTEL_SERVICE_NAME — Service identifier in traces
  • OTEL_EXPORTER_OTLP_ENDPOINT — Where to send telemetry (OTLP backend)
  • OTEL_TRACES_EXPORTER — Set to otlp (default) or none to disable
  • OTEL_METRICS_EXPORTER — Set to otlp or none
  • OTEL_LOGS_EXPORTER — Set to otlp or none

Production note: Don't use the latest tag in production. Pin a specific agent version (e.g., 1.32.0) to avoid unexpected behavior from auto-updates.

.NET

Install the OpenTelemetry .NET Automatic Instrumentation via script or NuGet. Then set CLR profiling environment variables:

export CORECLR_ENABLE_PROFILING=1
export CORECLR_PROFILER={918728DD-259F-4A6A-AC2B-B85E1B658318}
export CORECLR_PROFILER_PATH=/path/to/OpenTelemetry.AutoInstrumentation.Native.so
export OTEL_DOTNET_AUTO_HOME=/path/to/otel-dotnet-auto
export OTEL_SERVICE_NAME=my-service
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

dotnet MyApp.dll

Docker example:

FROM mcr.microsoft.com/dotnet/aspnet:8.0
WORKDIR /app
COPY --from=otel/autoinstrumentation-dotnet:latest /autoinstrumentation /otel-auto
ENV CORECLR_ENABLE_PROFILING=1 \
    CORECLR_PROFILER={918728DD-259F-4A6A-AC2B-B85E1B658318} \
    CORECLR_PROFILER_PATH=/otel-auto/linux-x64/OpenTelemetry.AutoInstrumentation.Native.so \
    OTEL_DOTNET_AUTO_HOME=/otel-auto \
    OTEL_SERVICE_NAME=my-dotnet-service \
    OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
COPY ./publish /app
ENTRYPOINT ["dotnet", "MyApp.dll"]

Python

Install the OpenTelemetry Python auto-instrumentation packages:

pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install

Then wrap your application startup:

export OTEL_SERVICE_NAME=my-service
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
opentelemetry-instrument python my_app.py

For Flask apps:

# No code changes needed — just run with the wrapper
opentelemetry-instrument flask run

Note: The opentelemetry-instrument wrapper automatically detects installed frameworks (Flask, Django, FastAPI, SQLAlchemy) and instruments them. If you use a niche library, you may need to manually instrument it with the SDK.

Node.js

Install the OpenTelemetry Node.js auto-instrumentation package:

npm install @opentelemetry/auto-instrumentations-node

Create a tracing.js file:

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');

const sdk = new NodeSDK({
  serviceName: 'my-service',
  traceExporter: new OTLPTraceExporter({
    url: 'http://localhost:4318/v1/traces',
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

Then require it before your app starts:

node --require ./tracing.js app.js

Note: If you're using TypeScript with ts-node, use --require with the compiled JS version of tracing.js, not the TS source. Otherwise, you'll hit module resolution errors.

💡
If you want to understand the technical details of how bytecode transformation and runtime hooks actually work behind the scenes, our guide on how OpenTelemetry auto-instrumentation works breaks down the instrumentation pipeline step by step.

Kubernetes Deployment with OpenTelemetry Operator

The easiest way to deploy agents in Kubernetes is the OpenTelemetry Operator, which injects agents as init containers automatically. The operator is a CNCF project with production-grade stability.

Install the operator:

kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

Create an Instrumentation resource:

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: my-instrumentation
spec:
  exporter:
    endpoint: http://otel-collector:4318
  propagators:
    - tracecontext
    - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "1.0"
  java:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
  dotnet:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:latest
  python:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
  nodejs:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest

Annotate your deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    metadata:
      annotations:
        instrumentation.opentelemetry.io/inject-java: "true"
    spec:
      containers:
      - name: app
        image: my-app:latest

The operator will inject the agent as an init container, mount it into your pod, and configure the necessary environment variables. Your app starts with instrumentation already attached.

Note: Pin operator and agent image versions in production. Using latest can introduce breaking changes during pod restarts.

💡
If you're evaluating deployment patterns beyond agents, check out our comparison of [OpenTelemetry sidecars vs agents] to understand the tradeoffs between running collectors as sidecars versus embedding instrumentation directly in your application process.

OpenTelemetry Agent Performance Overhead

Agent overhead varies by language, traffic volume, and how many libraries you're instrumenting. Based on OpenTelemetry's official performance documentation and community benchmarks, here's what production deployments typically observe:

Language CPU Overhead Memory Overhead Latency Impact
Java 5-10% 30-50MB heap +1-3ms per request
.NET 5-8% 20-40MB +1-2ms per request
Python 8-12% 15-30MB +2-5ms per request
Node.js 6-10% 20-35MB +1-4ms per request

These numbers come from running agents on apps handling 5k-10k requests per second. Your mileage will vary based on your stack.

When Overhead Becomes a Problem

Agents add overhead in two ways:

  1. Instrumentation hooks — Every intercepted function call runs agent code (span creation, context propagation, attribute collection)
  2. Telemetry export — Agents batch and send spans to the backend, which consumes CPU and network bandwidth

If you're running high-throughput services (50k+ req/sec) or latency-sensitive APIs (p99 < 10ms), agent overhead can become noticeable. In those cases:

  • Use head-based sampling — Only trace 1% or 10% of requests to reduce span volume
  • Disable metric collection — Metrics have higher overhead than traces
  • Tune batch sizes — Increase batch size and export intervals to reduce network calls
  • Profile agent impact — Use JFR (Java), dotTrace (.NET), or py-spy (Python) to see where the agent is spending time

Troubleshooting OpenTelemetry Agents

Agents fail silently more often than they should. Here's how to debug common issues.

Spans Not Showing Up in Your Backend

Check the exporter endpoint:

# Make sure the endpoint is reachable
curl -X POST http://localhost:4318/v1/traces -H "Content-Type: application/json" -d '{"resourceSpans":[]}'

If the endpoint isn't responding, your agent can't send telemetry. Check firewall rules, network policies, or service mesh configs.

Enable agent debug logging:

For Java:

-Dotel.javaagent.debug=true

For .NET:

export OTEL_LOG_LEVEL=debug

For Python:

export OTEL_LOG_LEVEL=debug

For Node.js:

const { diag, DiagConsoleLogger, DiagLogLevel } = require('@opentelemetry/api');
diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.DEBUG);

Debug logs will show you if the agent is attaching correctly, which libraries it's instrumenting, and whether spans are being exported.

High CPU Usage from the Agent

If your app's CPU spikes after enabling the agent, you're probably tracing too much.

Disable instrumentation for specific libraries:

For Java:

-Dotel.instrumentation.[library-name].enabled=false
# Example: disable Kafka instrumentation
-Dotel.instrumentation.kafka.enabled=false

For Python:

export OTEL_PYTHON_DISABLED_INSTRUMENTATIONS=flask,sqlalchemy

For Node.js, exclude instrumentations in your tracing.js:

instrumentations: [
  getNodeAutoInstrumentations({
    '@opentelemetry/instrumentation-fs': { enabled: false },
  }),
]

Reduce sampling rate:

export OTEL_TRACES_SAMPLER=parentbased_traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1  # Sample 10% of traces

ClassLoader Conflicts (Java)

If your Java app throws ClassNotFoundException or NoClassDefFoundError after attaching the agent, you've hit a version conflict between the agent's bundled libraries and your app's dependencies.

Fix:

  1. Check which library is conflicting (usually SLF4J, Guava, or OkHttp)
  2. Exclude it from the agent:
-Dotel.javaagent.exclude-classes=com.google.common.*,org.slf4j.*
  1. If that doesn't work, upgrade your app's dependency to match the agent's version (check the agent's POM file for exact versions)

Agent Not Instrumenting a Custom Framework

If you're using a niche web framework or database driver, the agent might not have instrumentation for it yet.

Check the instrumentation registry:

If your library isn't listed, you have two options:

  1. Manually instrument it using the SDK — Add span creation calls around the library's entry points
  2. Contribute instrumentation to the OpenTelemetry project — Write a plugin and submit a PR

Configure Sampling and Exporters

Agents export telemetry to backends via OTLP. By default, they send 100% of spans, which can get expensive at scale.

Head-Based Sampling

Head-based sampling decides at the start of a trace whether to record it. If a trace is sampled, all spans in that trace are kept. If not, the entire trace is dropped (OpenTelemetry sampling documentation).

Set sampling rate via environment variables:

export OTEL_TRACES_SAMPLER=parentbased_traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1  # 10% sampling

Sampling strategies:

  • always_on — Sample every trace (default)
  • always_off — Drop every trace (useful for disabling tracing)
  • traceidratio — Sample based on trace ID hash (deterministic, distributed-friendly)
  • parentbased_traceidratio — Respect the parent sampling decision; otherwise, use the trace ID ratio

For most production systems, 10% sampling gives you enough trace coverage without overwhelming your backend. If you need more granular control (e.g., sample 100% of errors but 1% of successful requests), you'll need tail-based sampling, which requires an OpenTelemetry Collector.

Exporter Configuration

Agents support multiple exporters. The most common is OTLP over HTTP:

export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_EXPORTER_OTLP_HEADERS="x-api-key=YOUR_API_KEY"

For OTLP over gRPC:

export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc

Note: Use an OpenTelemetry Collector as a sidecar or central gateway rather than sending telemetry directly from agents to your backend. The collector can buffer spans, apply tail-based sampling, enrich data with metadata, and route to multiple backends. It also isolates your app from backend downtime — if your observability vendor goes down, the collector will queue spans locally until the backend recovers.

💡
For production setups, consider deploying an OpenTelemetry Gateway between your agents and backend—it acts as a central collection point that can buffer spans, apply tail-based sampling, and route telemetry to multiple destinations without coupling your apps to specific vendors.

Why Last9 Works Well with OpenTelemetry Agents

Once you've deployed OpenTelemetry agents, you're sending spans, metrics, and logs to a backend. The challenge: agent-generated telemetry is high-cardinality and high-volume. Auto-instrumentation creates spans for every HTTP request, database query, and cache hit — often with dozens of attributes per span.

Traditional observability backends struggle with this. They force you to either:

  1. Drop 95% of spans via aggressive sampling — You lose trace coverage and can't debug tail latency or rare errors
  2. Pay exponentially more as cardinality grows — Costs spike when you instrument more services or add custom attributes
  3. Hit query timeouts — Backends built for metrics can't handle complex trace queries at scale

Last9 solves this by treating high-cardinality telemetry as a first-class problem, not an edge case.

No Sampling Required at the Agent Level

You can send 100% of spans from your agents without exploding storage costs or query performance. That means:

  • No blind spots — You see every error, every slow query, every outlier
  • Accurate percentiles — p99 latency calculations aren't skewed by sampling artifacts
  • Better root cause analysis — You can trace individual requests end-to-end, even if they failed 0.01% of the time

Faster Queries on Messy Trace Data

Agents generate messy telemetry: auto-generated span names, attribute explosion, and inconsistent tagging across services. Last9's query engine is optimized for this reality. You can filter, aggregate, and visualize trace data without hitting timeouts or needing to pre-aggregate everything.

For example, querying "show me all spans where http.status_code >= 500 and db.latency > 100ms across the last 7 days" works instantly — even if that query spans millions of spans. Traditional backends either can't run this query or require you to pre-aggregate it into a custom metric.

Cost Predictability

Unlike backends that charge per span or per GB ingested, Last9's pricing model doesn't penalize you for instrumenting everything. You're not forced to choose between observability and budget.

OpenTelemetry agents emit detailed context — routes, users, flags, builds, and more. That level of detail is where high-cardinality data shines. Last9 is designed to handle it end-to-end, so you can keep their instrumentation intact as systems scale.

You can try Last9 free with 100M events per month and bring in your existing OpenTelemetry data in under 5 minutes.

FAQs

What's the difference between the OpenTelemetry agent and the SDK?

The agent instruments your app automatically at runtime without code changes. The SDK requires manual instrumentation in your code. Use the agent for legacy apps or quick rollouts; use the SDK for fine-grained control over spans, attributes, and sampling.

How much overhead does an OpenTelemetry agent add?

Typically 5-10% CPU and 20-50MB memory, depending on language and traffic volume. Java agents have the highest overhead; eBPF-based agents (for compiled languages like C++) have the lowest. Always benchmark in staging before rolling out to production.

Can I use OpenTelemetry agents in Kubernetes?

Yes. The easiest way is the OpenTelemetry Operator, which injects agents as init containers. Alternatively, bake the agent into your Docker image and set environment variables in your deployment YAML. Both approaches work — the operator is just more convenient for managing instrumentation at scale.

Do OpenTelemetry agents work with proprietary APM tools?

Yes, if the APM tool supports OTLP (OpenTelemetry Protocol). Last9, Datadog, New Relic, Dynatrace, and all accept OTLP. You just point the agent's exporter to their endpoint. Some vendors also provide their own OpenTelemetry distributions with vendor-specific optimizations.

What happens if the OpenTelemetry agent crashes?

The agent runs in the same process as your app, so a crash can take down the app. In practice, agents are stable — the OpenTelemetry project has extensive test coverage and production usage at scale. Still, test in staging first and monitor agent-specific errors (e.g., ClassLoader conflicts in Java) in production.

Should I sample traces at the agent level or use tail-based sampling?

Use head-based sampling at the agent if you're okay with simple probabilistic sampling (e.g., "trace 10% of all requests"). Use tail-based sampling if you need smarter logic (e.g., "trace 100% of errors and slow requests, but only 1% of fast successful requests"). Tail-based sampling requires an OpenTelemetry Collector, which adds operational complexity but gives you a much better signal-to-noise ratio.

Can I use OpenTelemetry agents alongside manual SDK instrumentation?

Yes. Agents and SDK instrumentation can coexist in the same app. The agent handles auto-instrumentation for frameworks and libraries, while you use the SDK to add custom spans, attributes, and events for business logic. Just make sure both are configured to use the same exporter and sampling settings.

Authors
Prathamesh Sonpatki

Prathamesh Sonpatki

Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

X

Contents

Do More with Less

Unlock unified observability and faster triaging for your team.