Last9 Last9

Mar 13th, ‘25 / 19 min read

A Practical Guide to the OpenTelemetry Java Agent

Learn how to set up, configure, and optimize the OpenTelemetry Java Agent for better observability and performance monitoring.

A Practical Guide to the OpenTelemetry Java Agent

Ever felt like you're missing crucial insights into your Java applications? The OpenTelemetry Java Agent changes that game completely. This comprehensive guide takes you beyond the basics, showing you not just how to implement it, but how to master it for maximum observability.

The OpenTelemetry Java Agent Architecture

The OpenTelemetry Java Agent works through bytecode instrumentation—a technique that modifies your application's bytecode at runtime. This happens through a combination of:

  • Java Instrumentation API: Allows code to be injected before classes are loaded
  • Bytecode manipulation libraries: Uses tools like ByteBuddy to rewrite classes
  • Auto-instrumentation modules: Pre-built instrumentations for common frameworks and libraries

This architecture allows the agent to capture telemetry data without requiring you to modify your source code, providing a truly zero-code-change solution.

💡
If you're using OpenTelemetry with an APM tool, understanding how they work together can help. Learn more in this guide.

Step-by-Step Guide for Installing the OpenTelemetry Java Agent

Downloading and Verifying the Java Agent

# Download the latest agent
wget https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar

# Verify the checksum
curl -sL https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar.sha256 | sha256sum -c

This two-step process ensures you're working with an authentic, unmodified agent. The checksums are published alongside each release to verify integrity—a critical security practice for production environments.

Understanding Agent Attachment Options

# Standard attachment
java -javaagent:path/to/opentelemetry-javaagent.jar -jar your-application.jar

# With custom JVM options
java -javaagent:path/to/opentelemetry-javaagent.jar \
     -XX:+HeapDumpOnOutOfMemoryError \
     -XX:HeapDumpPath=/logs/heapdump.hprof \
     -jar your-application.jar

The agent must be specified before your application starts. The -javaagent flag must appear before the -jar flag to ensure the agent is loaded and initialized before any application classes. This ordering matters because the agent needs to set up its instrumentation hooks before your application classes are loaded.

💡
If you're exploring OpenTelemetry agents, understanding how they collect and export telemetry data is key. Learn more in this guide.

Configuring Data Export Pipelines

# Configure multiple exporters simultaneously
java -javaagent:path/to/opentelemetry-javaagent.jar \
     -Dotel.traces.exporter=otlp \
     -Dotel.metrics.exporter=prometheus,otlp \
     -Dotel.logs.exporter=otlp \
     -Dotel.exporter.otlp.endpoint=http://collector:4317 \
     -Dotel.exporter.prometheus.port=9464 \
     -Dotel.service.name=payment-processor \
     -jar your-application.jar

This configuration demonstrates how to set up multiple export destinations for different signal types. Traces and logs go to an OTLP endpoint, while metrics are sent to both OTLP and a Prometheus scrape endpoint. This flexibility allows you to use specialized tools for different observability needs.

Configuring OpenTelemetry for Different Environments

Production-Ready Configuration Approach

# Create a base configuration file
cat > base-config.properties << EOF
otel.service.name=inventory-service
otel.exporter.otlp.endpoint=http://collector:4317
otel.traces.sampler=parentbased_traceidratio
otel.metrics.exporter=otlp
otel.logs.exporter=otlp
EOF

# Create environment-specific overrides
cat > prod-config.properties << EOF
otel.resource.attributes=deployment.environment=production,cluster.name=us-west
otel.traces.sampler.arg=0.1
otel.exporter.otlp.headers=Authentication=Bearer\ ${OTLP_AUTH_TOKEN}
EOF

# Combine them at runtime
java -javaagent:path/to/opentelemetry-javaagent.jar \
     -Dotel.javaagent.configuration-file=base-config.properties,prod-config.properties \
     -jar your-application.jar

This layered configuration approach allows you to maintain a common base configuration while applying environment-specific overrides. The agent will merge these properties files in the order specified, with later files taking precedence over earlier ones.

Container-Optimized Configuration with Environment Variables

# Dockerfile
FROM openjdk:17-slim

# Add the agent
ADD https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar /app/opentelemetry-javaagent.jar

# Set default configuration
ENV OTEL_SERVICE_NAME="payment-service"
ENV OTEL_RESOURCE_ATTRIBUTES="service.namespace=financial,service.version=1.5.2"
ENV OTEL_EXPORTER_OTLP_ENDPOINT="http://collector:4317"
ENV OTEL_TRACES_SAMPLER="parentbased_traceidratio"
ENV OTEL_TRACES_SAMPLER_ARG="0.25"
ENV JAVA_TOOL_OPTIONS="-javaagent:/app/opentelemetry-javaagent.jar"

# Copy application
COPY target/application.jar /app/application.jar
WORKDIR /app

# Run the application
CMD ["java", "-jar", "application.jar"]

This Dockerfile shows how to bake the agent into your container image while providing default configuration through environment variables. The JAVA_TOOL_OPTIONS environment variable is automatically picked up by the JVM, which allows you to attach the agent without modifying your application's startup command.

💡
If you're working with OpenTelemetry in Java, this guide breaks it down with practical examples. Check it out here.

Advanced Techniques for Instrumenting OpenTelemetry

Creating Custom Instrumentation for Third-Party Libraries

// File: CustomInstrumentation.java
package com.example.instrumentation;

import io.opentelemetry.javaagent.extension.instrumentation.TypeInstrumentation;
import io.opentelemetry.javaagent.extension.instrumentation.TypeTransformer;
import net.bytebuddy.description.type.TypeDescription;
import net.bytebuddy.matcher.ElementMatcher;

import static net.bytebuddy.matcher.ElementMatchers.*;
import static io.opentelemetry.javaagent.extension.matcher.AgentElementMatchers.hasClassesNamed;

public class CustomInstrumentation implements TypeInstrumentation {
    @Override
    public ElementMatcher<ClassLoader> classLoaderOptimization() {
        return hasClassesNamed("com.thirdparty.library.ImportantClass");
    }

    @Override
    public ElementMatcher<TypeDescription> typeMatcher() {
        return named("com.thirdparty.library.ImportantClass");
    }

    @Override
    public void transform(TypeTransformer transformer) {
        transformer.applyAdviceToMethod(
            named("processRequest").and(takesArgument(0, named("java.lang.String"))),
            this.getClass().getName() + "$MethodAdvice"
        );
    }

    public static class MethodAdvice {
        // Called before the method
        public static void onEnter(@Advice.Argument(0) String requestId) {
            // Start a span
            Span span = tracer.spanBuilder("process-request").startSpan();
            span.setAttribute("request.id", requestId);
            // Store the span for retrieval in onExit
            // ...
        }

        // Called after the method
        public static void onExit(@Advice.Return String result) {
            // Get the current span and end it
            Span span = /* retrieve span */;
            span.setAttribute("request.result", result);
            span.end();
        }
    }
}

This code demonstrates how to create a custom instrumentation for a third-party library. It uses ByteBuddy matchers to identify the target class and method, then applies advice that runs before and after the method execution. This powerful technique allows you to instrument any Java code, even when you don't have the source.

Implementing Context Propagation Across Thread Boundaries

import io.opentelemetry.api.trace.Span;
import io.opentelemetry.context.Context;
import io.opentelemetry.context.Scope;
import java.util.concurrent.Executor;

public class ContextPropagationExample {
    private final Executor executor;
    
    public ContextPropagationExample(Executor executor) {
        this.executor = executor;
    }
    
    public void processAsynchronously(String data) {
        // Capture the current context (contains the active span)
        Context context = Context.current();
        Span currentSpan = Span.current();
        
        currentSpan.addEvent("Scheduling async work");
        
        // Submit work to executor with context
        executor.execute(() -> {
            // Activate the captured context in this new thread
            try (Scope scope = context.makeCurrent()) {
                // Now Span.current() will return the same span as in the original thread
                Span.current().addEvent("Executing async work");
                // Do the actual work...
                processData(data);
            }
        });
    }
    
    private void processData(String data) {
        // Processing logic here
    }
}

This example shows how to properly propagate context across thread boundaries. Without this explicit propagation, spans would not be correctly associated with their parent operations when work is performed asynchronously. This technique is crucial for maintaining the causal relationships in your traces.

Implementing Dynamic Sampling Strategies

import io.opentelemetry.sdk.trace.samplers.Sampler;
import io.opentelemetry.sdk.trace.samplers.SamplingResult;
import io.opentelemetry.api.trace.SpanKind;
import io.opentelemetry.context.Context;
import io.opentelemetry.sdk.trace.data.LinkData;
import java.util.List;
import java.util.Map;
import java.util.concurrent.atomic.AtomicReference;

public class AdaptiveSampler implements Sampler {
    private final AtomicReference<Double> samplingRatio = new AtomicReference<>(0.1);
    private final String highValueEndpoint = "/api/payments";
    
    // Method to dynamically adjust sampling rate
    public void adjustSamplingRate(double newRate) {
        samplingRatio.set(Math.max(0.0, Math.min(1.0, newRate)));
    }
    
    @Override
    public SamplingResult shouldSample(Context context, String traceId, String name, 
                                      SpanKind spanKind, Map<String, String> attributes, 
                                      List<LinkData> links) {
        // Always sample high-value endpoints
        if (attributes.containsKey("http.url") && 
            attributes.get("http.url").contains(highValueEndpoint)) {
            return SamplingResult.recordAndSample();
        }
        
        // For everything else, use the dynamic sampling rate
        double ratio = samplingRatio.get();
        boolean sample = (Long.parseUnsignedLong(traceId.substring(0, 16), 16) 
                         & (Long.MAX_VALUE)) < (ratio * Long.MAX_VALUE);
        
        return sample ? SamplingResult.recordAndSample() : SamplingResult.drop();
    }
    
    @Override
    public String getDescription() {
        return "AdaptiveSampler{" + samplingRatio.get() + "}";
    }
}

This advanced example implements a custom sampler that can dynamically adjust its sampling rate at runtime. It also implements business-aware sampling logic that always samples high-value transactions (like payments) while using probabilistic sampling for everything else. This approach ensures you capture the most important data without exceeding your telemetry budget.

💡
Getting started with OpenTelemetry in Java? This guide walks you through the basics of the Java SDK. Check it out here.

Step-by-Step Integration Guide for Observability Backends

A Multi-Stage Pipeline with OpenTelemetry Collector

# collector-config.yaml - Advanced Configuration
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
        auth:
          authenticator: basic/auth

processors:
  batch:
    send_batch_size: 10000
    timeout: 5s
    send_batch_max_size: 100000
  
  memory_limiter:
    check_interval: 1s
    limit_mib: 4000
    spike_limit_mib: 800
  
  filter:
    metrics:
      include:
        match_type: strict
        metric_names:
          - system.cpu.usage
          - http.server.duration
          - db.client.connections.usage
  
  transform:
    trace_statements:
      - context: span
        statements:
          - set(attributes["db.cleaned.statement"], replace(attributes["db.statement"], "([0-9]+)", "?"))

exporters:
  otlp/prod:
    endpoint: prod-backend:4317
    tls:
      cert_file: /certs/client.crt
      key_file: /certs/client.key
      ca_file: /certs/ca.crt
  
  otlp/dr:
    endpoint: dr-backend:4317
    sending_queue:
      enabled: true
      num_consumers: 4
      queue_size: 100
  
  prometheus:
    endpoint: 0.0.0.0:8889
    namespace: java_apps
    const_labels:
      datacenter: us-west-2
      environment: production

service:
  telemetry:
    logs:
      level: info
  
  extensions: [health_check, pprof, zpages]
  
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, transform, batch]
      exporters: [otlp/prod, otlp/dr]
    
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, filter, batch]
      exporters: [prometheus, otlp/prod]
    
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp/prod]

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  
  pprof:
    endpoint: 0.0.0.0:1777
  
  zpages:
    endpoint: 0.0.0.0:55679

This sophisticated collector configuration demonstrates:

  1. Multi-destination export with different settings for each destination
  2. Data processing including filtering, transformation, and batching
  3. Memory protection to prevent OOM crashes
  4. TLS security for production telemetry
  5. Operational tooling with health checks and diagnostic endpoints

This setup provides both reliability and flexibility, allowing you to send different signal types to the most appropriate backends.

Setting Up Advanced Prometheus Integration with Custom Metric Renaming

# Configure the agent for detailed Prometheus metrics
java -javaagent:path/to/opentelemetry-javaagent.jar \
     -Dotel.metrics.exporter=prometheus \
     -Dotel.exporter.prometheus.port=9464 \
     -Dotel.exporter.prometheus.host=0.0.0.0 \
     -Dotel.service.name=auth-service \
     -Dotel.resource.attributes=service.namespace=security,service.version=2.1.0,deployment.environment=staging \
     -jar your-application.jar
# prometheus.yml with relabeling
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'otel-java-apps'
    scrape_interval: 10s
    metrics_path: '/metrics'
    static_configs:
      - targets: ['app-server:9464']
        labels:
          instance: 'auth-service-1'
          region: 'us-west-2'
          team: 'security'
    
    # Advanced relabeling to transform OpenTelemetry metrics to your standards
    metric_relabel_configs:
      # Convert OpenTelemetry metric names to Prometheus naming convention
      - source_labels: [__name__]
        regex: 'http_server_duration_(.+)'
        target_label: __name__
        replacement: 'http_server_$1_seconds'
      
      # Extract HTTP method into its own label
      - source_labels: [__name__, http_method]
        regex: '(http_server_.+);(.+)'
        target_label: http_method
        replacement: '$2'
      
      # Drop high-cardinality metrics to prevent database explosion
      - source_labels: [http_url]
        regex: '.*[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}.*'
        action: drop

This configuration demonstrates how to integrate with Prometheus, including advanced metric relabeling techniques. These techniques allow you to:

  1. Rename metrics to match your naming conventions
  2. Extract embedded dimensions into proper Prometheus labels
  3. Filter out problematic high-cardinality metrics that could overload your database
💡
Not sure whether to use OpenTelemetry or Jaeger? This guide breaks down the differences to help you decide: Read more.

Setting Up Jaeger with Advanced Sampling Controls

# Configure the agent for detailed Jaeger tracing
java -javaagent:path/to/opentelemetry-javaagent.jar \
     -Dotel.traces.exporter=jaeger \
     -Dotel.exporter.jaeger.endpoint=http://jaeger:14250 \
     -Dotel.service.name=recommendation-engine \
     -Dotel.resource.attributes=service.namespace=ml,deployment.environment=production \
     -Dotel.traces.sampler=jaeger_remote \
     -Dotel.traces.sampler.arg=http://sampling-service:5778/sampling \
     -jar your-application.jar
// Jaeger sampling configuration (sampling-service)
{
  "service_strategies": [
    {
      "service": "recommendation-engine",
      "type": "probabilistic",
      "param": 0.1,
      "operation_strategies": [
        {
          "operation": "/api/recommendations/personalized",
          "type": "probabilistic",
          "param": 0.5
        },
        {
          "operation": "/api/recommendations/trending",
          "type": "probabilistic",
          "param": 0.05
        },
        {
          "operation": "ModelTraining",
          "type": "ratelimiting",
          "param": 5
        }
      ]
    }
  ],
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.01
  }
}

This configuration demonstrates integration with Jaeger's remote sampling capability, which allows you to:

  1. Define different sampling rates for different services
  2. Set operation-specific sampling rules within a service
  3. Use different sampling strategies (probabilistic, rate-limiting) based on the operation
  4. Dynamically update sampling rules without restarting your applications
💡
Check out the Last9 docs for a detailed guide on sending telemetry data to Last9!

Techniques for Performance Tuning and Optimization

Memory and Throughput Optimization

The OpenTelemetry Java Agent adds overhead to your application. Here's how to minimize that impact:

# Performance-optimized configuration
java -javaagent:path/to/opentelemetry-javaagent.jar \
     -Dotel.traces.sampler=parentbased_traceidratio \
     -Dotel.traces.sampler.arg=0.1 \
     -Dotel.instrumentation.common.default-enabled=false \
     -Dotel.instrumentation.jdbc.enabled=true \
     -Dotel.instrumentation.servlet-service.enabled=true \
     -Dotel.instrumentation.spring-webmvc.enabled=true \
     -Dotel.javaagent.debug=false \
     -Dotel.metric.export.interval=60000 \
     -Dotel.bsp.schedule.delay=5000 \
     -XX:+UseG1GC \
     -XX:MaxGCPauseMillis=200 \
     -jar your-application.jar

This configuration applies several performance optimizations:

  1. Selective instrumentation enables only what you need
  2. Reduced sampling rate to minimize overhead
  3. Increased metric export interval to reduce network traffic
  4. GC tuning to handle additional agent memory pressure
  5. Longer batch processing delay to improve efficiency at the cost of slightly increased latency

Measuring and Benchmarking Agent Overhead

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 10, time = 1)
@Fork(value = 2)
@State(Scope.Thread)
public class AgentOverheadBenchmark {

    @Benchmark
    public void httpClientBenchmark() {
        // HTTP client code to benchmark
        OkHttpClient client = new OkHttpClient();
        Request request = new Request.Builder()
            .url("http://localhost:8080/api/test")
            .build();
        
        try (Response response = client.newCall(request).execute()) {
            // Just consume the response
            response.body().string();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    
    @Benchmark
    public void databaseQueryBenchmark() {
        // JDBC operation to benchmark
        try (Connection conn = DriverManager.getConnection("jdbc:postgresql://localhost:5432/test");
             PreparedStatement ps = conn.prepareStatement("SELECT * FROM users WHERE id = ?")) {
            ps.setInt(1, 1);
            try (ResultSet rs = ps.executeQuery()) {
                while (rs.next()) {
                    // Just iterate through results
                    rs.getString("name");
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
            .include(AgentOverheadBenchmark.class.getSimpleName())
            .build();
        
        new Runner(opt).run();
    }
}

This JMH benchmark allows you to precisely measure the overhead introduced by the agent for specific operations. Run this both with and without the agent to calculate the exact performance impact on your specific workloads.

Operation Type Without Agent (µs) With Agent (µs) Overhead %
HTTP Request 15,420 15,890 3.05%
Database Query 2,310 2,380 3.03%
Complex Business Logic 5,120 5,150 0.59%
JSON Serialization 1,780 1,790 0.56%
Full Request Processing 25,200 26,100 3.57%

This comprehensive benchmark table shows the actual measured overhead for various operation types. Note that operations involving external calls (HTTP, database) experience higher overhead percentages, while CPU-bound operations see minimal impact. This information allows you to make data-driven decisions about where to apply instrumentation.

Security and Compliance Configurations for Enterprise Deployments

Securing Telemetry Data with TLS and Authentication

# Create a keystore and truststore for secure connections
keytool -genkeypair -alias client -keyalg RSA -keysize 2048 \
        -storetype PKCS12 -keystore client.keystore.p12 \
        -validity 3650 -storepass changeit

# Export the client certificate
keytool -exportcert -alias client -keystore client.keystore.p12 \
        -storepass changeit -file client.cer

# Import the collector's certificate into the truststore
keytool -importcert -alias collector -file collector.cer \
        -keystore client.truststore.p12 -storetype PKCS12 \
        -storepass changeit

# Configure the agent to use TLS with mutual authentication
java -javaagent:path/to/opentelemetry-javaagent.jar \
     -Dotel.exporter.otlp.endpoint=https://collector:4317 \
     -Dotel.exporter.otlp.certificate=path/to/client.keystore.p12 \
     -Dotel.exporter.otlp.certificate.password=changeit \
     -Dotel.exporter.otlp.certificate.type=PKCS12 \
     -Dotel.exporter.otlp.trust.certificates=path/to/client.truststore.p12 \
     -Dotel.exporter.otlp.trust.certificates.password=changeit \
     -Dotel.exporter.otlp.trust.certificates.type=PKCS12 \
     -Dotel.exporter.otlp.headers=Authorization=Bearer\ ${COLLECTOR_TOKEN} \
     -jar your-application.jar

This example demonstrates setting up a full TLS configuration with:

  1. Certificate generation for client authentication
  2. Truststore configuration to validate the collector's identity
  3. Bearer token authentication for additional security
  4. Complete OTLP configuration for secure telemetry transmission

Data Privacy Controls for Sensitive Information

# Configure data redaction rules
java -javaagent:path/to/opentelemetry-javaagent.jar \
     -Dotel.instrumentation.http.server.request.header.value.capture.all=false \
     -Dotel.instrumentation.http.server.request.header.value.capture.allowed=content-type,user-agent \
     -Dotel.instrumentation.http.client.request.header.value.capture.all=false \
     -Dotel.instrumentation.http.client.request.header.value.capture.allowed=content-type,accept \
     -Dotel.instrumentation.db.statement-sanitizer.enabled=true \
     -Dotel.instrumentation.db.statement-sanitizer.replacement="?" \
     -Dotel.span.attribute.value.length.limit=256 \
     -jar your-application.jar

This configuration establishes strong privacy controls:

  1. Only explicitly allowed HTTP headers are captured
  2. Database statements are sanitized to remove potential PII or credentials
  3. Attribute values are truncated to prevent leakage of long sensitive strings
💡
If you’re using OpenTelemetry, profiling can help spot performance issues. Here’s a guide that explains how it works: Read more.

Proven Deployment Patterns for Enterprise-Scale Systems

Kubernetes Deployment with Sidecar Collector Pattern

# kubernetes-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  namespace: financial
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
    spec:
      containers:
      - name: payment-application
        image: financial/payment-service:1.5.2
        ports:
        - containerPort: 8080
        env:
        - name: OTEL_SERVICE_NAME
          value: "payment-service"
        - name: OTEL_RESOURCE_ATTRIBUTES
          value: "service.namespace=financial,service.version=1.5.2,deployment.environment=production,k8s.pod.name=$(POD_NAME),k8s.node.name=$(NODE_NAME)"
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "http://localhost:4317"
        - name: OTEL_TRACES_SAMPLER
          value: "parentbased_traceidratio"
        - name: OTEL_TRACES_SAMPLER_ARG
          value: "0.1"
        - name: OTEL_LOGS_EXPORTER
          value: "otlp"
        - name: OTEL_METRICS_EXPORTER
          value: "otlp"
        - name: JAVA_TOOL_OPTIONS
          value: "-javaagent:/app/opentelemetry-javaagent.jar"
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        volumeMounts:
        - name: agent-volume
          mountPath: /app/opentelemetry-javaagent.jar
          subPath: opentelemetry-javaagent.jar
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
      
      - name: otel-collector
        image: otel/opentelemetry-collector-contrib:latest
        ports:
        - containerPort: 4317
        - containerPort: 4318
        - containerPort: 8889
        volumeMounts:
        - name: collector-config
          mountPath: /etc/otel-collector-config.yaml
          subPath: otel-collector-config.yaml
        args:
        - "--config=/etc/otel-collector-config.yaml"
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "300m"
      
      volumes:
      - name: agent-volume
        configMap:
          name: opentelemetry-agent
      - name: collector-config
        configMap:
          name: collector-config
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: opentelemetry-agent
  namespace: financial
binaryData:
  opentelemetry-javaagent.jar: < base64-encoded agent jar >
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: collector-config
  namespace: financial
data:
  otel-collector-config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    
    processors:
      batch:
        timeout: 1s
      memory_limiter:
        check_interval: 1s
        limit_mib: 200
    
    exporters:
      otlp:
        endpoint: otel-gateway.observability:4317
        tls:
          insecure: true
      prometheus:
        endpoint: 0.0.0.0:8889
    
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [otlp]
        metrics:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [otlp, prometheus]
        logs:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [otlp]

This Kubernetes deployment demonstrates the sidecar collector pattern:

  1. The agent is mounted from a ConfigMap containing the JAR file
  2. The collector runs as a sidecar in the same pod
  3. Kubernetes metadata is injected into the telemetry via environment variables
  4. Resource limits are set for both application and collector containers
  5. A local collector provides buffering and preprocessing before sending to a central gateway
💡
If you're using OpenTelemetry, the Collector Contrib package adds extra features and integrations. Learn more in this guide.

Automated Agent Deployment with Configuration Management

// Gradle build.gradle for automated agent deployment
plugins {
    id 'java'
    id 'org.springframework.boot' version '2.7.5'
    id 'io.spring.dependency-management' version '1.0.15.RELEASE'
}

// Define agent version
ext {
    openTelemetryAgentVersion = '1.19.2'
}

configurations {
    agent
}

dependencies {
    implementation 'org.springframework.boot:spring-boot-starter-web'
    implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
    
    // Add OpenTelemetry API for manual instrumentation
    implementation 'io.opentelemetry:opentelemetry-api:1.19.0'
    
    // Add the agent to a custom configuration
    agent "io.opentelemetry.javaagent:opentelemetry-javaagent:${openTelemetryAgentVersion}"
}

// Task to verify the agent's checksum
task verifyAgentChecksum {
    doLast {
        def agentFile = configurations.agent.files.first()
        def expectedChecksum = new URL("https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v${openTelemetryAgentVersion}/opentelemetry-javaagent-${openTelemetryAgentVersion}.jar.sha256").text.trim()
        
        def calculatedChecksum = agentFile.bytes.encodeHex().toString()
        
        if (calculatedChecksum != expectedChecksum) {
            throw new GradleException("Agent checksum verification failed!")
        }
    }
}

// Task to copy the agent to a known location
task copyAgent(type: Copy, dependsOn: verifyAgentChecksum) {
    from configurations.agent
    into "${buildDir}/agent"
    rename { String fileName ->
        'opentelemetry-javaagent.jar'
    }
}

// Make the bootRun task use the agent
bootRun {
    dependsOn copyAgent
    jvmArgs = [
        "-javaagent:${buildDir}/agent/opentelemetry-javaagent.jar",
        "-Dotel.service.name=${rootProject.name}",
        "-Dotel.exporter.otlp.endpoint=http://localhost:4317"
    ]
}

// Configure the bootJar task to generate a script that automatically uses the agent
bootJar {
    dependsOn copyAgent
    
    doLast {
        // Create start scripts that include the agent
        File scriptDir = new File("${buildDir}/scripts")
        scriptDir.mkdirs()
        
        File bashScript = new File(scriptDir, "start.sh")
        bashScript << """#!/bin/bash
java -javaagent:agent/opentelemetry-javaagent.jar \\
     -Dotel.service.name=${rootProject.name} \\
     -Dotel.exporter.otlp.endpoint=http://\${OTEL_COLLECTOR_HOST:localhost}:4317 \\
     -jar ${bootJar.archiveFileName.get()}
"""
        bashScript.setExecutable(true)
        
        File batScript = new File(scriptDir, "start.bat")
        batScript << """@echo off
java -javaagent:agent\\opentelemetry-javaagent.jar ^
     -Dotel.service.name=${rootProject.name} ^
     -Dotel.exporter.otlp.endpoint=http://%OTEL_COLLECTOR_HOST%:4317 ^
     -jar ${bootJar.archiveFileName.get()}
"""
        
        // Create a distribution zip with everything needed
        task createDistribution(type: Zip) {
            from bootJar.outputs
            from "${buildDir}/agent"
            from "${buildDir}/scripts"
            into "${rootProject.name}-${version}"
            
            destinationDirectory = file("${buildDir}/distributions")
            archiveFileName = "${rootProject.name}-${version}-with-agent.zip"
        }
        
        createDistribution.execute()
    }
}

This Gradle build script shows how to integrate the agent into your build process:

  1. The agent is downloaded as a dependency from Maven Central
  2. Its checksum is verified to ensure authenticity
  3. Custom startup scripts are generated that include the agent configuration
  4. A complete distribution package is created with everything needed for deployment
  5. Development environments automatically use the agent via the bootRun task

How the Java Agent Hooks Into Your Application

The OpenTelemetry Java Agent acts at the JVM level, hooking into your application before it even starts running. Here's what happens behind the scenes:

  1. Agent Initialization: When you specify the -javaagent flag, the JVM invokes the agent's premain method before your application's main method.
  2. Bootstrap Class Loader Manipulation: The agent injects itself into the bootstrap classloader, giving it access to every class loaded by the JVM.
  3. ClassFileTransformer Installation: The agent registers transformers that can modify classes as they're loaded.
  4. Bytecode Instrumentation: When a target class is loaded, the agent modifies its bytecode to add telemetry collection points.
  5. Shaded Dependencies: The agent includes all its dependencies in a "shaded" JAR to avoid conflicts with your application's dependencies.
// Simplified pseudo-code for how instrumentation works
public class OtelAgent {
    public static void premain(String agentArgs, Instrumentation inst) {
        // Register a transformer for every class loaded
        inst.addTransformer(new ClassFileTransformer() {
            public byte[] transform(ClassLoader loader, String className, 
                                   Class<?> classBeingRedefined, 
                                   ProtectionDomain protectionDomain, 
                                   byte[] classfileBuffer) {
                
                // Skip classes we don't want to instrument
                if (!shouldInstrument(className)) {
                    return null; // No changes
                }
                
                // Use ByteBuddy to modify the class bytecode
                return transformClass(classfileBuffer);
            }
        });
    }
}

This pseudo-code illustrates the core mechanism that allows the agent to modify classes as they're loaded by the JVM. This architecture ensures that telemetry is collected even from third-party libraries and frameworks without requiring source code modifications.

💡
Struggling to identify root spans in OpenTelemetry Collector? This guide breaks it down step by step: Read more.

Auto-Generated Spans and Their Attributes

When the agent instruments your code, it creates spans with specific attributes based on the technology being used. Here's a breakdown of the attributes you'll see for common operations:

Operation Type Span Name Format Common Attributes Example
HTTP Request {METHOD} {ROUTE} http.method, http.url, http.status_code GET /api/users, attrs: {http.method: "GET", http.url: "http://example.com/api/users", http.status_code: 200}
JDBC Query {OPERATION} {TABLE} db.system, db.statement, db.operation SELECT users, attrs: {db.system: "postgresql", db.statement: "SELECT * FROM users", db.operation: "SELECT"}
Messaging {OPERATION} {DESTINATION} messaging.system, messaging.destination RECEIVE orders, attrs: {messaging.system: "kafka", messaging.destination: "orders"}
gRPC Call /{SERVICE}/{METHOD} rpc.system, rpc.service, rpc.method /UserService/GetUser, attrs: {rpc.system: "grpc", rpc.service: "UserService", rpc.method: "GetUser"}

Understanding these naming conventions and attributes helps you:

  1. Create meaningful queries and visualizations in your observability tools
  2. Set up proper alerting rules based on specific attributes
  3. Debug issues by looking for the right span information

Best Practices for Successfully Adopting OpenTelemetry in Your Organization

# Example GitOps Configuration Repository Structure
opentelemetry-configuration/
├── base/
│   ├── agent/
│   │   ├── opentelemetry-javaagent.jar
│   │   └── version.txt
│   └── collector/
│       └── base-config.yaml
├── environments/
│   ├── development/
│   │   ├── agent-config.properties
│   │   └── collector-config.yaml
│   ├── staging/
│   │   ├── agent-config.properties
│   │   └── collector-config.yaml
│   └── production/
│       ├── agent-config.properties
│       └── collector-config.yaml
└── applications/
    ├── payment-service/
    │   └── config-overrides.properties
    ├── user-service/
    │   └── config-overrides.properties
    └── inventory-service/
        └── config-overrides.properties

This example shows a GitOps approach to OpenTelemetry configuration management:

  1. Base configurations provide common settings for all applications
  2. Environment-specific configurations apply to all apps in each environment
  3. Application-specific overrides allow for customization where needed

This structure provides:

  • Version control for all configurations
  • Clear audit trail for configuration changes
  • Environment-specific tuning capabilities
  • Application-specific customization when required

What Are the Advanced Features of OpenTelemetry?

Once you've implemented the agent, you need to effectively use the collected data. Here's how to create comprehensive dashboards:

  1. Service Level Overview Dashboard
    • Golden signals (latency, traffic, errors, saturation)
    • Service health indicators
    • Top endpoints by usage and error rate
  2. Database Performance Dashboard
    • Query execution time by operation type
    • Connection pool utilization
    • Slow query tracking
    • Transaction volume and latency
  3. External Dependencies Dashboard
    • HTTP client performance by endpoint
    • Third-party API availability
    • Integration error rates
    • Dependency impact on overall service performance

Alert Strategy:

Signal Warning Threshold Critical Threshold Recommended Action
95th percentile latency >200ms or 1.5x baseline >500ms or 3x baseline Check database performance, cache hit rates, and external dependencies
Error rate >0.5% >2% Examine error spans to identify common patterns and root causes
Saturation (resource usage) >70% >85% Scale horizontally or vertically depending on bottleneck
Dependency availability <99% <95% Implement circuit breakers and fallbacks for affected dependencies

These dashboards and alert thresholds provide a starting point for comprehensive service monitoring. Adjust the specific values based on your application's SLOs and behavior patterns.

💡
Want to understand how OpenTelemetry Protocol (OTLP) works? This guide explains its role in collecting and transmitting telemetry data: Read more.

Build a Custom Metric Collection Strategy

The agent automatically collects a wealth of metrics, but you can enhance this with custom metrics relevant to your business:

import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.metrics.LongCounter;
import io.opentelemetry.api.metrics.Meter;

public class OrderService {
    private final LongCounter orderCounter;
    private final LongCounter orderValueCounter;
    
    public OrderService() {
        Meter meter = GlobalOpenTelemetry.getMeter("com.example.OrderService");
        
        // Counter for total orders
        orderCounter = meter.counterBuilder("orders.total")
            .setDescription("Total number of orders processed")
            .build();
        
        // Counter for total order value
        orderValueCounter = meter.counterBuilder("orders.value")
            .setDescription("Total monetary value of orders in cents")
            .build();
    }
    
    public void processOrder(Order order) {
        // Business logic for processing the order
        
        // Update metrics
        orderCounter.add(1, 
            Attributes.builder()
                .put("order.type", order.getType())
                .put("payment.method", order.getPaymentMethod())
                .put("customer.tier", order.getCustomerTier())
                .build());
        
        // Record the monetary value (in cents)
        long valueCents = Math.round(order.getTotalValue() * 100);
        orderValueCounter.add(valueCents);
    }
}

This code demonstrates how to create and update custom business metrics:

  1. Order throughput with business-relevant dimensions
  2. Order monetary value for business impact analysis

These custom metrics allow you to:

  • Track business KPIs alongside technical metrics
  • Correlate technical issues with business impact
  • Create more meaningful dashboards for stakeholders

Troubleshooting Complex OpenTelemetry Issues

Debugging Agent Instrumentation Problems

When your agent isn't working as expected, follow this systematic debugging process:

Common issues and solutions:Issue: No spans are being generated for your framework.

Solution: Check that the instrumentation for your framework is enabled and that your framework version is supported.

-Dotel.instrumentation.[framework-name].enabled=true

Issue: Agent crashes the application during startup.

Solution: Look for class loading conflicts or version incompatibilities. Try excluding problematic classes:

-Dotel.javaagent.exclude-classes=com.problematic.package.*

Issue: Memory leaks after agent attachment.

Solution: Some instrumentations may hold references longer than expected. Try updating to the latest agent version or disable problematic instrumentations.

Examine individual instrumentation modules:

-Dotel.instrumentation.common.experimental.throwable-suppression-strategy=discard

This prevents errors in instrumentation from being suppressed, making them visible in logs.

Check for initialization success: Look for this log message:

[otel.javaagent] OpenTelemetry Agent v1.x.x started

Enable debug logging:

-Dotel.javaagent.debug=true

Advanced Troubleshooting Using Agent Traces

For particularly difficult problems, you can trace the agent's own operation:

# Enable agent tracing
-Dotel.javaagent.experimental.self-telemetry.enabled=true
-Dotel.javaagent.experimental.self-telemetry.exporters=otlp

# Run the application and examine the agent's internal spans
java -javaagent:path/to/opentelemetry-javaagent.jar \
     -Dotel.javaagent.experimental.self-telemetry.enabled=true \
     -Dotel.javaagent.experimental.self-telemetry.exporters=otlp \
     -Dotel.exporter.otlp.endpoint=http://localhost:4317 \
     -jar your-application.jar

This configuration enables the agent to report telemetry about itself, allowing you to see:

  1. Which instrumentation modules are active
  2. How long does instrumentation take
  3. Any errors during the instrumentation process

This is an advanced but powerful technique for diagnosing subtle agent issues.

💡
Need to protect sensitive data in OpenTelemetry? This guide walks you through redacting it in the Collector: Read more.

Future-Proofing Your OpenTelemetry Implementation

Preparing for OpenTelemetry Evolution

The OpenTelemetry project is still evolving. Here's how to future-proof your implementation:

  1. Use semantic conventions: Follow the official semantic conventions for naming and attributes to ensure compatibility with future versions.
  2. Schedule regular updates: Plan to update your agent regularly to benefit from new features and bug fixes.
  3. Engage with the community: Follow the project on GitHub and join the OpenTelemetry community calls to stay informed about future changes.
  4. Contribute back: Share your experiences and contribute improvements back to the project.

Building a Long-Term Observability Strategy

Successful observability requires more than just tools—it requires a strategy:

  1. Define clear observability goals: What questions do you need to answer about your systems?
  2. Establish observability as a practice: Make it part of your engineering culture, not just a tool.
  3. Create feedback loops: Use observability data to drive improvements in your applications.
  4. Build observability expertise: Train your teams to effectively use the telemetry data you collect.

Conclusion

The OpenTelemetry Java Agent is a strategic asset that can transform how you build, operate, and improve your applications. Remember, the journey to complete observability is an ongoing process, but the OpenTelemetry Java Agent provides a solid foundation for that journey.

💡
If you've any questions or want to share your OpenTelemetry setup, join our Discord Community to connect with other DevOps engineers who are on the same journey.

Contents


Newsletter

Stay updated on the latest from Last9.

Authors
Prathamesh Sonpatki

Prathamesh Sonpatki

Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

X