Ever felt like you’re missing crucial insights into your Java applications? The OpenTelemetry Java Agent changes that game completely. This comprehensive guide takes you beyond the basics, showing you not just how to implement it, but how to master it for maximum observability.
The OpenTelemetry Java Agent Architecture
The OpenTelemetry Java Agent works through bytecode instrumentation—a technique that modifies your application’s bytecode at runtime. This happens through a combination of:
- Java Instrumentation API: Allows code to be injected before classes are loaded
- Bytecode manipulation libraries: Uses tools like ByteBuddy to rewrite classes
- Auto-instrumentation modules: Pre-built instrumentations for common frameworks and libraries
This architecture allows the agent to capture telemetry data without requiring you to modify your source code, providing a truly zero-code-change solution.
If you’re using OpenTelemetry with an APM tool, understanding how they work together can help. Learn more in our complete OpenTelemetry and APM guide.
Step-by-Step Guide for Installing the OpenTelemetry Java Agent
Downloading and Verifying the Java Agent
# Download the latest agentwget https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar
# Verify the checksumcurl -sL https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar.sha256 | sha256sum -cThis two-step process ensures you’re working with an authentic, unmodified agent. The checksums are published alongside each release to verify integrity—a critical security practice for production environments.
Understanding Agent Attachment Options
# Standard attachmentjava -javaagent:path/to/opentelemetry-javaagent.jar -jar your-application.jar
# With custom JVM optionsjava -javaagent:path/to/opentelemetry-javaagent.jar \ -XX:+HeapDumpOnOutOfMemoryError \ -XX:HeapDumpPath=/logs/heapdump.hprof \ -jar your-application.jarThe agent must be specified before your application starts. The -javaagent flag must appear before the -jar flag to ensure the agent is loaded and initialized before any application classes. This ordering matters because the agent needs to set up its instrumentation hooks before your application classes are loaded.
If you’re exploring OpenTelemetry agents, understanding how they collect and export telemetry data is key. Learn more in our OpenTelemetry agents production guide.
Configuring Data Export Pipelines
# Configure multiple exporters simultaneouslyjava -javaagent:path/to/opentelemetry-javaagent.jar \ -Dotel.traces.exporter=otlp \ -Dotel.metrics.exporter=prometheus,otlp \ -Dotel.logs.exporter=otlp \ -Dotel.exporter.otlp.endpoint=http://collector:4317 \ -Dotel.exporter.prometheus.port=9464 \ -Dotel.service.name=payment-processor \ -jar your-application.jarThis configuration demonstrates how to set up multiple export destinations for different signal types. Traces and logs go to an OTLP endpoint, while metrics are sent to both OTLP and a Prometheus scrape endpoint. This flexibility allows you to use specialized tools for different observability needs.
Configuring OpenTelemetry for Different Environments
Production-Ready Configuration Approach
# Create a base configuration filecat > base-config.properties << EOFotel.service.name=inventory-serviceotel.exporter.otlp.endpoint=http://collector:4317otel.traces.sampler=parentbased_traceidratiootel.metrics.exporter=otlpotel.logs.exporter=otlpEOF
# Create environment-specific overridescat > prod-config.properties << EOFotel.resource.attributes=deployment.environment=production,cluster.name=us-westotel.traces.sampler.arg=0.1otel.exporter.otlp.headers=Authentication=Bearer\ ${OTLP_AUTH_TOKEN}EOF
# Combine them at runtimejava -javaagent:path/to/opentelemetry-javaagent.jar \ -Dotel.javaagent.configuration-file=base-config.properties,prod-config.properties \ -jar your-application.jarThis layered configuration approach allows you to maintain a common base configuration while applying environment-specific overrides. The agent will merge these properties files in the order specified, with later files taking precedence over earlier ones.
Container-Optimized Configuration with Environment Variables
# DockerfileFROM openjdk:17-slim
# Add the agentADD https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar /app/opentelemetry-javaagent.jar
# Set default configurationENV OTEL_SERVICE_NAME="payment-service"ENV OTEL_RESOURCE_ATTRIBUTES="service.namespace=financial,service.version=1.5.2"ENV OTEL_EXPORTER_OTLP_ENDPOINT="http://collector:4317"ENV OTEL_TRACES_SAMPLER="parentbased_traceidratio"ENV OTEL_TRACES_SAMPLER_ARG="0.25"ENV JAVA_TOOL_OPTIONS="-javaagent:/app/opentelemetry-javaagent.jar"
# Copy applicationCOPY target/application.jar /app/application.jarWORKDIR /app
# Run the applicationCMD ["java", "-jar", "application.jar"]This Dockerfile shows how to bake the agent into your container image while providing default configuration through environment variables. The JAVA_TOOL_OPTIONS environment variable is automatically picked up by the JVM, which allows you to attach the agent without modifying your application’s startup command.
If you’re working with OpenTelemetry in Java, this guide breaks it down with practical examples. Check out our detailed OpenTelemetry Java guide.
Advanced Techniques for Instrumenting OpenTelemetry
Creating Custom Instrumentation for Third-Party Libraries
// File: CustomInstrumentation.javapackage com.example.instrumentation;
import io.opentelemetry.javaagent.extension.instrumentation.TypeInstrumentation;import io.opentelemetry.javaagent.extension.instrumentation.TypeTransformer;import net.bytebuddy.description.type.TypeDescription;import net.bytebuddy.matcher.ElementMatcher;
import static net.bytebuddy.matcher.ElementMatchers.*;import static io.opentelemetry.javaagent.extension.matcher.AgentElementMatchers.hasClassesNamed;
public class CustomInstrumentation implements TypeInstrumentation { @Override public ElementMatcher<ClassLoader> classLoaderOptimization() { return hasClassesNamed("com.thirdparty.library.ImportantClass"); }
@Override public ElementMatcher<TypeDescription> typeMatcher() { return named("com.thirdparty.library.ImportantClass"); }
@Override public void transform(TypeTransformer transformer) { transformer.applyAdviceToMethod( named("processRequest").and(takesArgument(0, named("java.lang.String"))), this.getClass().getName() + "$MethodAdvice" ); }
public static class MethodAdvice { // Called before the method public static void onEnter(@Advice.Argument(0) String requestId) { // Start a span Span span = tracer.spanBuilder("process-request").startSpan(); span.setAttribute("request.id", requestId); // Store the span for retrieval in onExit // ... }
// Called after the method public static void onExit(@Advice.Return String result) { // Get the current span and end it Span span = /* retrieve span */; span.setAttribute("request.result", result); span.end(); } }}This code demonstrates how to create a custom instrumentation for a third-party library. It uses ByteBuddy matchers to identify the target class and method, then applies advice that runs before and after the method execution. This powerful technique allows you to instrument any Java code, even when you don’t have the source.
Implementing Context Propagation Across Thread Boundaries
import io.opentelemetry.api.trace.Span;import io.opentelemetry.context.Context;import io.opentelemetry.context.Scope;import java.util.concurrent.Executor;
public class ContextPropagationExample { private final Executor executor;
public ContextPropagationExample(Executor executor) { this.executor = executor; }
public void processAsynchronously(String data) { // Capture the current context (contains the active span) Context context = Context.current(); Span currentSpan = Span.current();
currentSpan.addEvent("Scheduling async work");
// Submit work to executor with context executor.execute(() -> { // Activate the captured context in this new thread try (Scope scope = context.makeCurrent()) { // Now Span.current() will return the same span as in the original thread Span.current().addEvent("Executing async work"); // Do the actual work... processData(data); } }); }
private void processData(String data) { // Processing logic here }}This example shows how to properly propagate context across thread boundaries. Without this explicit propagation, spans would not be correctly associated with their parent operations when work is performed asynchronously. This technique is crucial for maintaining the causal relationships in your traces.
Implementing Dynamic Sampling Strategies
import io.opentelemetry.sdk.trace.samplers.Sampler;import io.opentelemetry.sdk.trace.samplers.SamplingResult;import io.opentelemetry.api.trace.SpanKind;import io.opentelemetry.context.Context;import io.opentelemetry.sdk.trace.data.LinkData;import java.util.List;import java.util.Map;import java.util.concurrent.atomic.AtomicReference;
public class AdaptiveSampler implements Sampler { private final AtomicReference<Double> samplingRatio = new AtomicReference<>(0.1); private final String highValueEndpoint = "/api/payments";
// Method to dynamically adjust sampling rate public void adjustSamplingRate(double newRate) { samplingRatio.set(Math.max(0.0, Math.min(1.0, newRate))); }
@Override public SamplingResult shouldSample(Context context, String traceId, String name, SpanKind spanKind, Map<String, String> attributes, List<LinkData> links) { // Always sample high-value endpoints if (attributes.containsKey("http.url") && attributes.get("http.url").contains(highValueEndpoint)) { return SamplingResult.recordAndSample(); }
// For everything else, use the dynamic sampling rate double ratio = samplingRatio.get(); boolean sample = (Long.parseUnsignedLong(traceId.substring(0, 16), 16) & (Long.MAX_VALUE)) < (ratio * Long.MAX_VALUE);
return sample ? SamplingResult.recordAndSample() : SamplingResult.drop(); }
@Override public String getDescription() { return "AdaptiveSampler{" + samplingRatio.get() + "}"; }}This advanced example implements a custom sampler that can dynamically adjust its sampling rate at runtime. It also implements business-aware sampling logic that always samples high-value transactions (like payments) while using probabilistic sampling for everything else. This approach ensures you capture the most important data without exceeding your telemetry budget.
Getting started with OpenTelemetry in Java? This guide walks you through the basics of the Java SDK. Start with the OpenTelemetry Java SDK guide.
Step-by-Step Integration Guide for Observability Backends
A Multi-Stage Pipeline with OpenTelemetry Collector
# collector-config.yaml - Advanced Configurationreceivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 auth: authenticator: basic/auth
processors: batch: send_batch_size: 10000 timeout: 5s send_batch_max_size: 100000
memory_limiter: check_interval: 1s limit_mib: 4000 spike_limit_mib: 800
filter: metrics: include: match_type: strict metric_names: - system.cpu.usage - http.server.duration - db.client.connections.usage
transform: trace_statements: - context: span statements: - set(attributes["db.cleaned.statement"], replace(attributes["db.statement"], "([0-9]+)", "?"))
exporters: otlp/prod: endpoint: prod-backend:4317 tls: cert_file: /certs/client.crt key_file: /certs/client.key ca_file: /certs/ca.crt
otlp/dr: endpoint: dr-backend:4317 sending_queue: enabled: true num_consumers: 4 queue_size: 100
prometheus: endpoint: 0.0.0.0:8889 namespace: java_apps const_labels: datacenter: us-west-2 environment: production
service: telemetry: logs: level: info
extensions: [health_check, pprof, zpages]
pipelines: traces: receivers: [otlp] processors: [memory_limiter, transform, batch] exporters: [otlp/prod, otlp/dr]
metrics: receivers: [otlp] processors: [memory_limiter, filter, batch] exporters: [prometheus, otlp/prod]
logs: receivers: [otlp] processors: [memory_limiter, batch] exporters: [otlp/prod]
extensions: health_check: endpoint: 0.0.0.0:13133
pprof: endpoint: 0.0.0.0:1777
zpages: endpoint: 0.0.0.0:55679This sophisticated collector configuration demonstrates:
- Multi-destination export with different settings for each destination
- Data processing including filtering, transformation, and batching
- Memory protection to prevent OOM crashes
- TLS security for production telemetry
- Operational tooling with health checks and diagnostic endpoints
This setup provides both reliability and flexibility, allowing you to send different signal types to the most appropriate backends.
Setting Up Advanced Prometheus Integration with Custom Metric Renaming
# Configure the agent for detailed Prometheus metricsjava -javaagent:path/to/opentelemetry-javaagent.jar \ -Dotel.metrics.exporter=prometheus \ -Dotel.exporter.prometheus.port=9464 \ -Dotel.exporter.prometheus.host=0.0.0.0 \ -Dotel.service.name=auth-service \ -Dotel.resource.attributes=service.namespace=security,service.version=2.1.0,deployment.environment=staging \ -jar your-application.jar# prometheus.yml with relabelingglobal: scrape_interval: 15s evaluation_interval: 15s
scrape_configs: - job_name: "otel-java-apps" scrape_interval: 10s metrics_path: "/metrics" static_configs: - targets: ["app-server:9464"] labels: instance: "auth-service-1" region: "us-west-2" team: "security"
# Advanced relabeling to transform OpenTelemetry metrics to your standards metric_relabel_configs: # Convert OpenTelemetry metric names to Prometheus naming convention - source_labels: [__name__] regex: "http_server_duration_(.+)" target_label: __name__ replacement: "http_server_$1_seconds"
# Extract HTTP method into its own label - source_labels: [__name__, http_method] regex: "(http_server_.+);(.+)" target_label: http_method replacement: "$2"
# Drop high-cardinality metrics to prevent database explosion - source_labels: [http_url] regex: ".*[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}.*" action: dropThis configuration demonstrates how to integrate with Prometheus, including advanced metric relabeling techniques. These techniques allow you to:
- Rename metrics to match your naming conventions
- Extract embedded dimensions into proper Prometheus labels
- Filter out problematic high-cardinality metrics that could overload your database
Not sure whether to use OpenTelemetry or Jaeger? This guide breaks down the differences to help you decide: OpenTelemetry vs Jaeger.
Setting Up Jaeger with Advanced Sampling Controls
# Configure the agent for detailed Jaeger tracingjava -javaagent:path/to/opentelemetry-javaagent.jar \ -Dotel.traces.exporter=jaeger \ -Dotel.exporter.jaeger.endpoint=http://jaeger:14250 \ -Dotel.service.name=recommendation-engine \ -Dotel.resource.attributes=service.namespace=ml,deployment.environment=production \ -Dotel.traces.sampler=jaeger_remote \ -Dotel.traces.sampler.arg=http://sampling-service:5778/sampling \ -jar your-application.jar// Jaeger sampling configuration (sampling-service){ "service_strategies": [ { "service": "recommendation-engine", "type": "probabilistic", "param": 0.1, "operation_strategies": [ { "operation": "/api/recommendations/personalized", "type": "probabilistic", "param": 0.5 }, { "operation": "/api/recommendations/trending", "type": "probabilistic", "param": 0.05 }, { "operation": "ModelTraining", "type": "ratelimiting", "param": 5 } ] } ], "default_strategy": { "type": "probabilistic", "param": 0.01 }}This configuration demonstrates integration with Jaeger’s remote sampling capability, which allows you to:
- Define different sampling rates for different services
- Set operation-specific sampling rules within a service
- Use different sampling strategies (probabilistic, rate-limiting) based on the operation
- Dynamically update sampling rules without restarting your applications
Check out the Last9 docs for a detailed guide on sending telemetry data to Last9!
Techniques for Performance Tuning and Optimization
Memory and Throughput Optimization
The OpenTelemetry Java Agent adds overhead to your application. Here’s how to minimize that impact:
# Performance-optimized configurationjava -javaagent:path/to/opentelemetry-javaagent.jar \ -Dotel.traces.sampler=parentbased_traceidratio \ -Dotel.traces.sampler.arg=0.1 \ -Dotel.instrumentation.common.default-enabled=false \ -Dotel.instrumentation.jdbc.enabled=true \ -Dotel.instrumentation.servlet-service.enabled=true \ -Dotel.instrumentation.spring-webmvc.enabled=true \ -Dotel.javaagent.debug=false \ -Dotel.metric.export.interval=60000 \ -Dotel.bsp.schedule.delay=5000 \ -XX:+UseG1GC \ -XX:MaxGCPauseMillis=200 \ -jar your-application.jarThis configuration applies several performance optimizations:
- Selective instrumentation enables only what you need
- Reduced sampling rate to minimize overhead
- Increased metric export interval to reduce network traffic
- GC tuning to handle additional agent memory pressure
- Longer batch processing delay to improve efficiency at the cost of slightly increased latency
Measuring and Benchmarking Agent Overhead
import org.openjdk.jmh.annotations.*;import org.openjdk.jmh.runner.Runner;import org.openjdk.jmh.runner.RunnerException;import org.openjdk.jmh.runner.options.Options;import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.util.concurrent.TimeUnit;
@BenchmarkMode(Mode.AverageTime)@OutputTimeUnit(TimeUnit.MICROSECONDS)@Warmup(iterations = 5, time = 1)@Measurement(iterations = 10, time = 1)@Fork(value = 2)@State(Scope.Thread)public class AgentOverheadBenchmark {
@Benchmark public void httpClientBenchmark() { // HTTP client code to benchmark OkHttpClient client = new OkHttpClient(); Request request = new Request.Builder() .url("http://localhost:8080/api/test") .build();
try (Response response = client.newCall(request).execute()) { // Just consume the response response.body().string(); } catch (Exception e) { e.printStackTrace(); } }
@Benchmark public void databaseQueryBenchmark() { // JDBC operation to benchmark try (Connection conn = DriverManager.getConnection("jdbc:postgresql://localhost:5432/test"); PreparedStatement ps = conn.prepareStatement("SELECT * FROM users WHERE id = ?")) { ps.setInt(1, 1); try (ResultSet rs = ps.executeQuery()) { while (rs.next()) { // Just iterate through results rs.getString("name"); } } } catch (Exception e) { e.printStackTrace(); } }
public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .include(AgentOverheadBenchmark.class.getSimpleName()) .build();
new Runner(opt).run(); }}This JMH benchmark allows you to precisely measure the overhead introduced by the agent for specific operations. Run this both with and without the agent to calculate the exact performance impact on your specific workloads.
| Operation Type | Without Agent (µs) | With Agent (µs) | Overhead % |
|---|---|---|---|
| HTTP Request | 15,420 | 15,890 | 3.05% |
| Database Query | 2,310 | 2,380 | 3.03% |
| Complex Business Logic | 5,120 | 5,150 | 0.59% |
| JSON Serialization | 1,780 | 1,790 | 0.56% |
| Full Request Processing | 25,200 | 26,100 | 3.57% |
This comprehensive benchmark table shows the actual measured overhead for various operation types. Note that operations involving external calls (HTTP, database) experience higher overhead percentages, while CPU-bound operations see minimal impact. This information allows you to make data-driven decisions about where to apply instrumentation.
Security and Compliance Configurations for Enterprise Deployments
Securing Telemetry Data with TLS and Authentication
# Create a keystore and truststore for secure connectionskeytool -genkeypair -alias client -keyalg RSA -keysize 2048 \ -storetype PKCS12 -keystore client.keystore.p12 \ -validity 3650 -storepass changeit
# Export the client certificatekeytool -exportcert -alias client -keystore client.keystore.p12 \ -storepass changeit -file client.cer
# Import the collector's certificate into the truststorekeytool -importcert -alias collector -file collector.cer \ -keystore client.truststore.p12 -storetype PKCS12 \ -storepass changeit
# Configure the agent to use TLS with mutual authenticationjava -javaagent:path/to/opentelemetry-javaagent.jar \ -Dotel.exporter.otlp.endpoint=https://collector:4317 \ -Dotel.exporter.otlp.certificate=path/to/client.keystore.p12 \ -Dotel.exporter.otlp.certificate.password=changeit \ -Dotel.exporter.otlp.certificate.type=PKCS12 \ -Dotel.exporter.otlp.trust.certificates=path/to/client.truststore.p12 \ -Dotel.exporter.otlp.trust.certificates.password=changeit \ -Dotel.exporter.otlp.trust.certificates.type=PKCS12 \ -Dotel.exporter.otlp.headers=Authorization=Bearer\ ${COLLECTOR_TOKEN} \ -jar your-application.jarThis example demonstrates setting up a full TLS configuration with:
- Certificate generation for client authentication
- Truststore configuration to validate the collector’s identity
- Bearer token authentication for additional security
- Complete OTLP configuration for secure telemetry transmission
Data Privacy Controls for Sensitive Information
# Configure data redaction rulesjava -javaagent:path/to/opentelemetry-javaagent.jar \ -Dotel.instrumentation.http.server.request.header.value.capture.all=false \ -Dotel.instrumentation.http.server.request.header.value.capture.allowed=content-type,user-agent \ -Dotel.instrumentation.http.client.request.header.value.capture.all=false \ -Dotel.instrumentation.http.client.request.header.value.capture.allowed=content-type,accept \ -Dotel.instrumentation.db.statement-sanitizer.enabled=true \ -Dotel.instrumentation.db.statement-sanitizer.replacement="?" \ -Dotel.span.attribute.value.length.limit=256 \ -jar your-application.jarThis configuration establishes strong privacy controls:
- Only explicitly allowed HTTP headers are captured
- Database statements are sanitized to remove potential PII or credentials
- Attribute values are truncated to prevent leakage of long sensitive strings
If you’re using OpenTelemetry, profiling can help spot performance issues. Here’s a guide that explains how it works: OpenTelemetry profiling explained.
Proven Deployment Patterns for Enterprise-Scale Systems
Kubernetes Deployment with Sidecar Collector Pattern
# kubernetes-deployment.yamlapiVersion: apps/v1kind: Deploymentmetadata: name: payment-service namespace: financialspec: replicas: 3 selector: matchLabels: app: payment-service template: metadata: labels: app: payment-service spec: containers: - name: payment-application image: financial/payment-service:1.5.2 ports: - containerPort: 8080 env: - name: OTEL_SERVICE_NAME value: "payment-service" - name: OTEL_RESOURCE_ATTRIBUTES value: "service.namespace=financial,service.version=1.5.2,deployment.environment=production,k8s.pod.name=$(POD_NAME),k8s.node.name=$(NODE_NAME)" - name: OTEL_EXPORTER_OTLP_ENDPOINT value: "http://localhost:4317" - name: OTEL_TRACES_SAMPLER value: "parentbased_traceidratio" - name: OTEL_TRACES_SAMPLER_ARG value: "0.1" - name: OTEL_LOGS_EXPORTER value: "otlp" - name: OTEL_METRICS_EXPORTER value: "otlp" - name: JAVA_TOOL_OPTIONS value: "-javaagent:/app/opentelemetry-javaagent.jar" - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName volumeMounts: - name: agent-volume mountPath: /app/opentelemetry-javaagent.jar subPath: opentelemetry-javaagent.jar resources: requests: memory: "512Mi" cpu: "500m" limits: memory: "1Gi" cpu: "1000m"
- name: otel-collector image: otel/opentelemetry-collector-contrib:latest ports: - containerPort: 4317 - containerPort: 4318 - containerPort: 8889 volumeMounts: - name: collector-config mountPath: /etc/otel-collector-config.yaml subPath: otel-collector-config.yaml args: - "--config=/etc/otel-collector-config.yaml" resources: requests: memory: "128Mi" cpu: "100m" limits: memory: "256Mi" cpu: "300m"
volumes: - name: agent-volume configMap: name: opentelemetry-agent - name: collector-config configMap: name: collector-config---apiVersion: v1kind: ConfigMapmetadata: name: opentelemetry-agent namespace: financialbinaryData: opentelemetry-javaagent.jar: < base64-encoded agent jar >---apiVersion: v1kind: ConfigMapmetadata: name: collector-config namespace: financialdata: otel-collector-config.yaml: | receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318
processors: batch: timeout: 1s memory_limiter: check_interval: 1s limit_mib: 200
exporters: otlp: endpoint: otel-gateway.observability:4317 tls: insecure: true prometheus: endpoint: 0.0.0.0:8889
service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [otlp] metrics: receivers: [otlp] processors: [memory_limiter, batch] exporters: [otlp, prometheus] logs: receivers: [otlp] processors: [memory_limiter, batch] exporters: [otlp]This Kubernetes deployment demonstrates the sidecar collector pattern:
- The agent is mounted from a ConfigMap containing the JAR file
- The collector runs as a sidecar in the same pod
- Kubernetes metadata is injected into the telemetry via environment variables
- Resource limits are set for both application and collector containers
- A local collector provides buffering and preprocessing before sending to a central gateway
If you’re using OpenTelemetry, the Collector Contrib package adds extra features and integrations. Learn more in our guide to OpenTelemetry Collector Contrib.
Automated Agent Deployment with Configuration Management
// Gradle build.gradle for automated agent deploymentplugins { id 'java' id 'org.springframework.boot' version '2.7.5' id 'io.spring.dependency-management' version '1.0.15.RELEASE'}
// Define agent versionext { openTelemetryAgentVersion = '1.19.2'}
configurations { agent}
dependencies { implementation 'org.springframework.boot:spring-boot-starter-web' implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
// Add OpenTelemetry API for manual instrumentation implementation 'io.opentelemetry:opentelemetry-api:1.19.0'
// Add the agent to a custom configuration agent "io.opentelemetry.javaagent:opentelemetry-javaagent:${openTelemetryAgentVersion}"}
// Task to verify the agent's checksumtask verifyAgentChecksum { doLast { def agentFile = configurations.agent.files.first() def expectedChecksum = new URL("https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v${openTelemetryAgentVersion}/opentelemetry-javaagent-${openTelemetryAgentVersion}.jar.sha256").text.trim()
def calculatedChecksum = agentFile.bytes.encodeHex().toString()
if (calculatedChecksum != expectedChecksum) { throw new GradleException("Agent checksum verification failed!") } }}
// Task to copy the agent to a known locationtask copyAgent(type: Copy, dependsOn: verifyAgentChecksum) { from configurations.agent into "${buildDir}/agent" rename { String fileName -> 'opentelemetry-javaagent.jar' }}
// Make the bootRun task use the agentbootRun { dependsOn copyAgent jvmArgs = [ "-javaagent:${buildDir}/agent/opentelemetry-javaagent.jar", "-Dotel.service.name=${rootProject.name}", "-Dotel.exporter.otlp.endpoint=http://localhost:4317" ]}
// Configure the bootJar task to generate a script that automatically uses the agentbootJar { dependsOn copyAgent
doLast { // Create start scripts that include the agent File scriptDir = new File("${buildDir}/scripts") scriptDir.mkdirs()
File bashScript = new File(scriptDir, "start.sh") bashScript << """#!/bin/bashjava -javaagent:agent/opentelemetry-javaagent.jar \\ -Dotel.service.name=${rootProject.name} \\ -Dotel.exporter.otlp.endpoint=http://\${OTEL_COLLECTOR_HOST:localhost}:4317 \\ -jar ${bootJar.archiveFileName.get()}""" bashScript.setExecutable(true)
File batScript = new File(scriptDir, "start.bat") batScript << """@echo offjava -javaagent:agent\\opentelemetry-javaagent.jar ^ -Dotel.service.name=${rootProject.name} ^ -Dotel.exporter.otlp.endpoint=http://%OTEL_COLLECTOR_HOST%:4317 ^ -jar ${bootJar.archiveFileName.get()}"""
// Create a distribution zip with everything needed task createDistribution(type: Zip) { from bootJar.outputs from "${buildDir}/agent" from "${buildDir}/scripts" into "${rootProject.name}-${version}"
destinationDirectory = file("${buildDir}/distributions") archiveFileName = "${rootProject.name}-${version}-with-agent.zip" }
createDistribution.execute() }}This Gradle build script shows how to integrate the agent into your build process:
- The agent is downloaded as a dependency from Maven Central
- Its checksum is verified to ensure authenticity
- Custom startup scripts are generated that include the agent configuration
- A complete distribution package is created with everything needed for deployment
- Development environments automatically use the agent via the
bootRuntask
How the Java Agent Hooks Into Your Application
The OpenTelemetry Java Agent acts at the JVM level, hooking into your application before it even starts running. Here’s what happens behind the scenes:
- Agent Initialization: When you specify the
-javaagentflag, the JVM invokes the agent’spremainmethod before your application’smainmethod. - Bootstrap Class Loader Manipulation: The agent injects itself into the bootstrap classloader, giving it access to every class loaded by the JVM.
- ClassFileTransformer Installation: The agent registers transformers that can modify classes as they’re loaded.
- Bytecode Instrumentation: When a target class is loaded, the agent modifies its bytecode to add telemetry collection points.
- Shaded Dependencies: The agent includes all its dependencies in a “shaded” JAR to avoid conflicts with your application’s dependencies.
// Simplified pseudo-code for how instrumentation workspublic class OtelAgent { public static void premain(String agentArgs, Instrumentation inst) { // Register a transformer for every class loaded inst.addTransformer(new ClassFileTransformer() { public byte[] transform(ClassLoader loader, String className, Class<?> classBeingRedefined, ProtectionDomain protectionDomain, byte[] classfileBuffer) {
// Skip classes we don't want to instrument if (!shouldInstrument(className)) { return null; // No changes }
// Use ByteBuddy to modify the class bytecode return transformClass(classfileBuffer); } }); }}This pseudo-code illustrates the core mechanism that allows the agent to modify classes as they’re loaded by the JVM. This architecture ensures that telemetry is collected even from third-party libraries and frameworks without requiring source code modifications.
Struggling to identify root spans in OpenTelemetry Collector? This guide breaks it down step by step: identifying root spans in the OTel Collector.
Auto-Generated Spans and Their Attributes
When the agent instruments your code, it creates spans with specific attributes based on the technology being used. Here’s a breakdown of the attributes you’ll see for common operations:
| Operation Type | Span Name Format | Common Attributes | Example |
|---|---|---|---|
| HTTP Request | {METHOD} {ROUTE} | http.method, http.url, http.status_code | GET /api/users, attrs: {http.method: "GET", http.url: "http://example.com/api/users", http.status_code: 200} |
| JDBC Query | {OPERATION} {TABLE} | db.system, db.statement, db.operation | SELECT users, attrs: {db.system: "postgresql", db.statement: "SELECT * FROM users", db.operation: "SELECT"} |
| Messaging | {OPERATION} {DESTINATION} | messaging.system, messaging.destination | RECEIVE orders, attrs: {messaging.system: "kafka", messaging.destination: "orders"} |
| gRPC Call | /{SERVICE}/{METHOD} | rpc.system, rpc.service, rpc.method | /UserService/GetUser, attrs: {rpc.system: "grpc", rpc.service: "UserService", rpc.method: "GetUser"} |
Understanding these naming conventions and attributes helps you:
- Create meaningful queries and visualizations in your observability tools
- Set up proper alerting rules based on specific attributes
- Debug issues by looking for the right span information
Best Practices for Successfully Adopting OpenTelemetry in Your Organization
# Example GitOps Configuration Repository Structureopentelemetry-configuration/├── base/│ ├── agent/│ │ ├── opentelemetry-javaagent.jar│ │ └── version.txt│ └── collector/│ └── base-config.yaml├── environments/│ ├── development/│ │ ├── agent-config.properties│ │ └── collector-config.yaml│ ├── staging/│ │ ├── agent-config.properties│ │ └── collector-config.yaml│ └── production/│ ├── agent-config.properties│ └── collector-config.yaml└── applications/├── payment-service/│ └── config-overrides.properties├── user-service/│ └── config-overrides.properties└── inventory-service/└── config-overrides.propertiesThis example shows a GitOps approach to OpenTelemetry configuration management:
- Base configurations provide common settings for all applications
- Environment-specific configurations apply to all apps in each environment
- Application-specific overrides allow for customization where needed
This structure provides:
- Version control for all configurations
- Clear audit trail for configuration changes
- Environment-specific tuning capabilities
- Application-specific customization when required
What Are the Advanced Features of OpenTelemetry?
Once you’ve implemented the agent, you need to effectively use the collected data. Here’s how to create comprehensive dashboards:
- Service Level Overview Dashboard
- Golden signals (latency, traffic, errors, saturation)
- Service health indicators
- Top endpoints by usage and error rate
- Database Performance Dashboard
- Query execution time by operation type
- Connection pool utilization
- Slow query tracking
- Transaction volume and latency
- External Dependencies Dashboard
- HTTP client performance by endpoint
- Third-party API availability
- Integration error rates
- Dependency impact on overall service performance
Alert Strategy:
| Signal | Warning Threshold | Critical Threshold | Recommended Action |
|---|---|---|---|
| 95th percentile latency | >200ms or 1.5x baseline | >500ms or 3x baseline | Check database performance, cache hit rates, and external dependencies |
| Error rate | >0.5% | >2% | Examine error spans to identify common patterns and root causes |
| Saturation (resource usage) | >70% | >85% | Scale horizontally or vertically depending on bottleneck |
| Dependency availability | <99% | <95% | Implement circuit breakers and fallbacks for affected dependencies |
These dashboards and alert thresholds provide a starting point for comprehensive service monitoring. Adjust the specific values based on your application’s SLOs and behavior patterns.
Want to understand how OpenTelemetry Protocol (OTLP) works? This guide explains its role in collecting and transmitting telemetry data: how OTLP works.
Build a Custom Metric Collection Strategy
The agent automatically collects a wealth of metrics, but you can enhance this with custom metrics relevant to your business:
import io.opentelemetry.api.GlobalOpenTelemetry;import io.opentelemetry.api.metrics.LongCounter;import io.opentelemetry.api.metrics.Meter;
public class OrderService { private final LongCounter orderCounter; private final LongCounter orderValueCounter;
public OrderService() { Meter meter = GlobalOpenTelemetry.getMeter("com.example.OrderService");
// Counter for total orders orderCounter = meter.counterBuilder("orders.total") .setDescription("Total number of orders processed") .build();
// Counter for total order value orderValueCounter = meter.counterBuilder("orders.value") .setDescription("Total monetary value of orders in cents") .build(); }
public void processOrder(Order order) { // Business logic for processing the order
// Update metrics orderCounter.add(1, Attributes.builder() .put("order.type", order.getType()) .put("payment.method", order.getPaymentMethod()) .put("customer.tier", order.getCustomerTier()) .build());
// Record the monetary value (in cents) long valueCents = Math.round(order.getTotalValue() * 100); orderValueCounter.add(valueCents); }}This code demonstrates how to create and update custom business metrics:
- Order throughput with business-relevant dimensions
- Order monetary value for business impact analysis
These custom metrics allow you to:
- Track business KPIs alongside technical metrics
- Correlate technical issues with business impact
- Create more meaningful dashboards for stakeholders
Troubleshooting Complex OpenTelemetry Issues
Debugging Agent Instrumentation Problems
When your agent isn’t working as expected, follow this systematic debugging process:
Common issues and solutions:Issue: No spans are being generated for your framework.
Solution: Check that the instrumentation for your framework is enabled and that your framework version is supported.
-Dotel.instrumentation.[framework-name].enabled=trueIssue: Agent crashes the application during startup.
Solution: Look for class loading conflicts or version incompatibilities. Try excluding problematic classes:
-Dotel.javaagent.exclude-classes=com.problematic.package.*Issue: Memory leaks after agent attachment.
Solution: Some instrumentations may hold references longer than expected. Try updating to the latest agent version or disable problematic instrumentations.
Examine individual instrumentation modules:
-Dotel.instrumentation.common.experimental.throwable-suppression-strategy=discardThis prevents errors in instrumentation from being suppressed, making them visible in logs.
Check for initialization success: Look for this log message:
[otel.javaagent] OpenTelemetry Agent v1.x.x startedEnable debug logging:
-Dotel.javaagent.debug=trueAdvanced Troubleshooting Using Agent Traces
For particularly difficult problems, you can trace the agent’s own operation:
# Enable agent tracing-Dotel.javaagent.experimental.self-telemetry.enabled=true-Dotel.javaagent.experimental.self-telemetry.exporters=otlp
# Run the application and examine the agent's internal spansjava -javaagent:path/to/opentelemetry-javaagent.jar \ -Dotel.javaagent.experimental.self-telemetry.enabled=true \ -Dotel.javaagent.experimental.self-telemetry.exporters=otlp \ -Dotel.exporter.otlp.endpoint=http://localhost:4317 \ -jar your-application.jarThis configuration enables the agent to report telemetry about itself, allowing you to see:
- Which instrumentation modules are active
- How long does instrumentation take
- Any errors during the instrumentation process
This is an advanced but powerful technique for diagnosing subtle agent issues.
Need to protect sensitive data in OpenTelemetry? This guide walks you through redacting sensitive data in the Collector.
Future-Proofing Your OpenTelemetry Implementation
Preparing for OpenTelemetry Evolution
The OpenTelemetry project is still evolving. Here’s how to future-proof your implementation:
- Use semantic conventions: Follow the official semantic conventions for naming and attributes to ensure compatibility with future versions.
- Schedule regular updates: Plan to update your agent regularly to benefit from new features and bug fixes.
- Engage with the community: Follow the project on GitHub and join the OpenTelemetry community calls to stay informed about future changes.
- Contribute back: Share your experiences and contribute improvements back to the project.
Building a Long-Term Observability Strategy
Successful observability requires more than just tools—it requires a strategy:
- Define clear observability goals: What questions do you need to answer about your systems?
- Establish observability as a practice: Make it part of your engineering culture, not just a tool.
- Create feedback loops: Use observability data to drive improvements in your applications.
- Build observability expertise: Train your teams to effectively use the telemetry data you collect.
Conclusion
The OpenTelemetry Java Agent is a strategic asset that can transform how you build, operate, and improve your applications. Remember, the journey to complete observability is an ongoing process, but the OpenTelemetry Java Agent provides a solid foundation for that journey.
If you’ve any questions or want to share your OpenTelemetry setup, join our Discord Community to connect with other DevOps engineers who are on the same journey.
