Setting up robust observability for your Spring applications is essential for maintaining reliable, high-performing systems. This guide walks you through implementing Spring OpenTelemetry with practical advice for common challenges.
Understanding the Core Components of Spring OpenTelemetry
Spring OpenTelemetry provides comprehensive observability for Spring Boot applications by collecting three primary data types:
- Traces: Complete request paths that flow through your distributed services
- Metrics: Quantitative measurements of your application's performance and behavior
- Logs: Contextual information about application events and activities
This observability framework allows you to monitor, troubleshoot, and optimize your Spring applications with greater precision and context.
Key Business Benefits of Implementing Spring OpenTelemetry
Implementing Spring OpenTelemetry offers several tangible benefits:
- Precise root cause analysis: Quickly identify the exact source of production issues
- End-to-end request visibility: Track how requests move through your microservices architecture
- Proactive monitoring: Detect potential issues before they impact your users
- Reduced mean time to resolution (MTTR): Solve problems faster with better contextual information
Many organizations report significant improvements in their incident response times after implementing Spring OpenTelemetry, with some reducing their MTTR by up to 60%.
Step-by-Step Spring OpenTelemetry Implementation Guide
Let's walk through the complete setup process for adding OpenTelemetry to your Spring application.
Step 1: Adding Required Dependencies to Your Spring Boot Project
For Gradle projects, update your build.gradle
file:
dependencies {
implementation 'org.springframework.boot:spring-boot-starter-web'
// Core OpenTelemetry dependencies
implementation 'io.opentelemetry.instrumentation:opentelemetry-spring-boot-starter:1.26.0-alpha'
implementation 'io.opentelemetry:opentelemetry-exporter-otlp:1.26.0'
// Additional instrumentation for common libraries
implementation 'io.opentelemetry.instrumentation:opentelemetry-jdbc:1.26.0-alpha'
implementation 'io.opentelemetry.instrumentation:opentelemetry-hibernate-6.0:1.26.0-alpha'
implementation 'io.opentelemetry.instrumentation:opentelemetry-spring-webmvc-6.0:1.26.0-alpha'
implementation 'io.opentelemetry.instrumentation:opentelemetry-spring-webflux-5.3:1.26.0-alpha'
// For log correlation
implementation 'io.opentelemetry:opentelemetry-sdk-extension-autoconfigure:1.26.0'
}
For Maven projects, add these dependencies to your pom.xml
:
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- Core OpenTelemetry dependencies -->
<dependency>
<groupId>io.opentelemetry.instrumentation</groupId>
<artifactId>opentelemetry-spring-boot-starter</artifactId>
<version>1.26.0-alpha</version>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
<version>1.26.0</version>
</dependency>
<!-- Additional instrumentation -->
<dependency>
<groupId>io.opentelemetry.instrumentation</groupId>
<artifactId>opentelemetry-jdbc</artifactId>
<version>1.26.0-alpha</version>
</dependency>
<dependency>
<groupId>io.opentelemetry.instrumentation</groupId>
<artifactId>opentelemetry-hibernate-6.0</artifactId>
<version>1.26.0-alpha</version>
</dependency>
<dependency>
<groupId>io.opentelemetry.instrumentation</groupId>
<artifactId>opentelemetry-spring-webmvc-6.0</artifactId>
<version>1.26.0-alpha</version>
</dependency>
<dependency>
<groupId>io.opentelemetry.instrumentation</groupId>
<artifactId>opentelemetry-spring-webflux-5.3</artifactId>
<version>1.26.0-alpha</version>
</dependency>
<!-- For log correlation -->
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-sdk-extension-autoconfigure</artifactId>
<version>1.26.0</version>
</dependency>
</dependencies>
Step 2: Configuring OpenTelemetry Properties in Your Spring Application
Create or update your application.properties
or application.yml
file with the necessary OpenTelemetry configuration:
application.properties:
# Service identification
otel.service.name=your-service-name
otel.resource.attributes=service.namespace=your-namespace,service.version=${project.version}
# Exporter configuration
otel.traces.exporter=otlp
otel.metrics.exporter=otlp
otel.logs.exporter=otlp
otel.exporter.otlp.endpoint=http://your-collector:4317
otel.exporter.otlp.protocol=grpc
# Sampling configuration
otel.traces.sampler=parentbased_traceidratio
otel.traces.sampler.arg=1.0
# Metrics configuration
otel.metrics.export.interval=60000
otel.metrics.export.timeout=30000
# Propagation
otel.propagators=tracecontext,baggage
application.yml:
otel:
service:
name: your-service-name
resource:
attributes: service.namespace=your-namespace,service.version=${project.version}
traces:
exporter: otlp
sampler: parentbased_traceidratio
sampler.arg: 1.0
metrics:
exporter: otlp
export:
interval: 60000
timeout: 30000
logs:
exporter: otlp
exporter:
otlp:
endpoint: http://your-collector:4317
protocol: grpc
propagators: tracecontext,baggage
Creating Environment-Specific OpenTelemetry Configurations
For better management, create separate configurations for different environments:
For development (application-dev.properties):
# Lower sampling rate for development
otel.traces.sampler.arg=0.3
# Local collector
otel.exporter.otlp.endpoint=http://localhost:4317
For production (application-prod.properties):
# Full sampling in production may be too expensive, adjust as needed
otel.traces.sampler.arg=0.5
# Production collector endpoint
otel.exporter.otlp.endpoint=http://prod-collector.internal:4317
# Enable secure transmission
otel.exporter.otlp.headers=Authorization=Bearer ${OTEL_AUTH_TOKEN}
Step 3: Fine-Tuning Spring Boot OpenTelemetry Auto-configuration
Create a configuration class to fine-tune OpenTelemetry initialization:
package com.yourcompany.config;
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.trace.propagation.W3CTraceContextPropagator;
import io.opentelemetry.context.propagation.ContextPropagators;
import io.opentelemetry.context.propagation.TextMapPropagator;
import io.opentelemetry.sdk.OpenTelemetrySdk;
import io.opentelemetry.sdk.resources.Resource;
import io.opentelemetry.sdk.trace.SdkTracerProvider;
import io.opentelemetry.sdk.trace.export.BatchSpanProcessor;
import io.opentelemetry.sdk.trace.export.SpanExporter;
import io.opentelemetry.semconv.resource.attributes.ResourceAttributes;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class OpenTelemetryConfig {
@Value("${spring.application.name}")
private String applicationName;
@Value("${spring.profiles.active:default}")
private String activeProfile;
@Bean
public Resource otelResource() {
return Resource.getDefault()
.merge(Resource.create(Attributes.of(
ResourceAttributes.SERVICE_NAME, applicationName,
ResourceAttributes.SERVICE_NAMESPACE, "com.yourcompany",
ResourceAttributes.DEPLOYMENT_ENVIRONMENT, activeProfile
)));
}
@Bean
public TextMapPropagator textMapPropagator() {
return W3CTraceContextPropagator.getInstance();
}
}
Advanced Instrumentation Techniques for Business-Specific Telemetry
Beyond the basic setup, you can add custom instrumentation to capture business-specific telemetry.
Implementing Custom Trace Context Management for Business Operations
This example shows how to create and manage a custom span for tracking business operations:
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.common.AttributeKey;
import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.SpanKind;
import io.opentelemetry.api.trace.StatusCode;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.context.Context;
import io.opentelemetry.context.Scope;
import org.springframework.stereotype.Service;
@Service
public class OrderProcessingService {
private final Tracer tracer;
public OrderProcessingService(OpenTelemetry openTelemetry) {
this.tracer = openTelemetry.getTracer("com.yourcompany.order.processing");
}
public void processOrder(String orderId, String customerId, Double amount) {
// Create a span for the entire order processing operation
Span orderSpan = tracer.spanBuilder("process-order")
.setSpanKind(SpanKind.INTERNAL)
.setAttribute("order.id", orderId)
.setAttribute("customer.id", customerId)
.setAttribute("order.amount", amount)
.startSpan();
// Make the span current for this execution context
try (Scope scope = orderSpan.makeCurrent()) {
// Log events within the span
orderSpan.addEvent("order-validation-started");
try {
// Validate order
validateOrder(orderId);
orderSpan.addEvent("order-validation-completed");
// Process payment in a sub-span
processPayment(orderId, amount);
// Fulfill order
fulfillOrder(orderId);
// Set span status to success
orderSpan.setStatus(StatusCode.OK);
} catch (Exception e) {
// Record error information
orderSpan.setStatus(StatusCode.ERROR, e.getMessage());
orderSpan.recordException(e, Attributes.of(
AttributeKey.stringKey("exception.type"), e.getClass().getName(),
AttributeKey.stringKey("exception.stacktrace"), getStackTraceAsString(e)
));
throw e;
}
} finally {
// Always end the span
orderSpan.end();
}
}
private void processPayment(String orderId, Double amount) {
// Create a child span for the payment processing
Span paymentSpan = tracer.spanBuilder("process-payment")
.setParent(Context.current())
.setAttribute("order.id", orderId)
.setAttribute("payment.amount", amount)
.startSpan();
try (Scope scope = paymentSpan.makeCurrent()) {
// Payment processing logic
Thread.sleep(100); // Simulate payment processing
paymentSpan.addEvent("payment-confirmed");
} catch (Exception e) {
paymentSpan.setStatus(StatusCode.ERROR, e.getMessage());
paymentSpan.recordException(e);
throw new RuntimeException("Payment processing failed", e);
} finally {
paymentSpan.end();
}
}
private void validateOrder(String orderId) {
// Validation logic
}
private void fulfillOrder(String orderId) {
// Fulfillment logic
}
private String getStackTraceAsString(Exception e) {
// Utility to convert stack trace to string
return "Stack trace"; // Simplified for brevity
}
}
Developing Custom Business Metrics for Performance Insights
Tracking business-specific metrics provides valuable insights into your application's performance:
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.common.AttributeKey;
import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.metrics.LongCounter;
import io.opentelemetry.api.metrics.LongHistogram;
import io.opentelemetry.api.metrics.Meter;
import org.springframework.stereotype.Component;
@Component
public class BusinessMetricsRecorder {
private final LongCounter orderCounter;
private final LongCounter paymentCounter;
private final LongHistogram orderValueHistogram;
public BusinessMetricsRecorder(OpenTelemetry openTelemetry) {
Meter meter = openTelemetry.getMeter("com.yourcompany.business.metrics");
// Counter for tracking order volume by type
orderCounter = meter.counterBuilder("orders.processed")
.setDescription("Total number of orders processed")
.setUnit("{orders}")
.build();
// Counter for payment transactions
paymentCounter = meter.counterBuilder("payments.processed")
.setDescription("Total number of payment transactions")
.setUnit("{transactions}")
.build();
// Histogram for tracking order value distribution
orderValueHistogram = meter.histogramBuilder("order.value")
.setDescription("Distribution of order values")
.setUnit("USD")
.build();
}
public void recordOrder(String orderType, String channel) {
orderCounter.add(1, Attributes.of(
AttributeKey.stringKey("order.type"), orderType,
AttributeKey.stringKey("order.channel"), channel
));
}
public void recordPayment(String method, boolean success, String currency) {
paymentCounter.add(1, Attributes.of(
AttributeKey.stringKey("payment.method"), method,
AttributeKey.booleanKey("payment.success"), success,
AttributeKey.stringKey("payment.currency"), currency
));
}
public void recordOrderValue(double value, String productCategory) {
orderValueHistogram.record(value, Attributes.of(
AttributeKey.stringKey("product.category"), productCategory
));
}
}
Enhancing Spring WebMVC Controllers with OpenTelemetry Context
Add context to your API endpoints:
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.StatusCode;
import org.springframework.web.bind.annotation.*;
@RestController
@RequestMapping("/api/orders")
public class OrderController {
private final OrderService orderService;
private final BusinessMetricsRecorder metricsRecorder;
public OrderController(OrderService orderService, BusinessMetricsRecorder metricsRecorder) {
this.orderService = orderService;
this.metricsRecorder = metricsRecorder;
}
@PostMapping
public OrderResponse createOrder(@RequestBody OrderRequest request) {
// Get current span created by Spring WebMVC instrumentation
Span span = Span.current();
// Add business context to the span
span.setAttribute("order.customer_id", request.getCustomerId());
span.setAttribute("order.total_items", request.getItems().size());
try {
// Process the order
OrderResponse response = orderService.createOrder(request);
// Record business metrics
metricsRecorder.recordOrder(request.getOrderType(), request.getChannel());
metricsRecorder.recordOrderValue(request.getTotalAmount(), request.getPrimaryCategory());
// Return successful response
return response;
} catch (Exception e) {
// Mark span as error
span.setStatus(StatusCode.ERROR, e.getMessage());
span.recordException(e);
// Re-throw the exception
throw e;
}
}
}
Setting Up and Deploying the OpenTelemetry Collector Infrastructure
For a complete observability pipeline, you'll need to configure an OpenTelemetry Collector.
Creating a Comprehensive OpenTelemetry Collector Configuration
Here's a simple collector configuration file (otel-collector-config.yaml
):
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 1000
memory_limiter:
check_interval: 1s
limit_mib: 4000
spike_limit_mib: 800
exporters:
prometheus:
endpoint: 0.0.0.0:8889
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
elasticsearch:
endpoints: ["http://elasticsearch:9200"]
index: otel-logs-%{YYYY.MM.DD}
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [jaeger]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [elasticsearch]
Deploying a Complete Observability Stack with Docker Compose
version: '3'
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
container_name: otel-collector
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "8889:8889" # Prometheus exporter
networks:
- monitoring
restart: unless-stopped
jaeger:
image: jaegertracing/all-in-one:latest
container_name: jaeger
ports:
- "16686:16686" # UI
- "14250:14250" # Model used by collector
networks:
- monitoring
environment:
- COLLECTOR_OTLP_ENABLED=true
restart: unless-stopped
prometheus:
image: prom/prometheus:latest
container_name: prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
networks:
- monitoring
restart: unless-stopped
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.16.2
container_name: elasticsearch
environment:
- discovery.type=single-node
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ports:
- "9200:9200"
networks:
- monitoring
restart: unless-stopped
kibana:
image: docker.elastic.co/kibana/kibana:7.16.2
container_name: kibana
ports:
- "5601:5601"
networks:
- monitoring
depends_on:
- elasticsearch
restart: unless-stopped
networks:
monitoring:
driver: bridge
Diagnosing and Resolving Common Spring OpenTelemetry Issues
Even with careful implementation, you may encounter issues. Here are solutions to common problems:
Resolving Issues When No Telemetry Data Appears in Backend Systems
Problem: Your application is running with OpenTelemetry configured, but no data appears in your visualization tools.
Troubleshooting Steps:
- Check application logs for OpenTelemetry initialization: Look for logs indicating successful SDK initialization and exporter configuration.
- Ensure correct endpoint configuration: Double-check that the OTLP endpoint in your application properties matches your collector's address.
- Verify protocol compatibility: Ensure your application and collector are using the same protocol (gRPC or HTTP).
Verify collector connectivity:
curl -v http://your-collector:4317/
# or
telnet your-collector 4317
Solution:
# Enable OpenTelemetry debug logging
logging.level.io.opentelemetry=DEBUG
# Explicitly set the protocol
otel.exporter.otlp.protocol=grpc
Fixing Incomplete Trace Context Propagation Across Services
Problem: You see disconnected traces across different services.
Troubleshooting Steps:
- Verify propagators configuration: Ensure all services use compatible context propagators.
- Inspect API gateway or proxy configuration: Some proxies might strip trace headers; ensure they're configured to pass them through.
Check HTTP headers: Use a tool like curl to verify trace context headers are being passed:
curl -v -H "traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01" http://your-service/endpoint
Solution:
@Bean
public OpenTelemetry openTelemetry() {
return OpenTelemetrySdk.builder()
.setPropagators(ContextPropagators.create(
CompositeTextMapPropagator.create(
W3CTraceContextPropagator.getInstance(),
W3CBaggagePropagator.getInstance())))
.build();
}
Managing Excessive Telemetry Data Generation in High-Traffic Applications
Problem: Your OpenTelemetry setup is generating too much data, causing storage issues or high costs.
Troubleshooting Steps:
- Analyze current data volume: Check storage rates in your backend systems.
- Review sampling configuration: Determine if you're collecting more data than necessary.
- Examine custom instrumentation: Look for overly verbose instrumentation in your code.
Solution:
# Implement more aggressive sampling
otel.traces.sampler=parentbased_traceidratio
otel.traces.sampler.arg=0.1 # Sample only 10% of traces
# For high-traffic services, consider even lower rates
# otel.traces.sampler.arg=0.01 # Sample only 1% of traces
Additional filtering at the collector level:
processors:
tail_sampling:
decision_wait: 10s
num_traces: 100
expected_new_traces_per_sec: 10
policies:
- name: error-only
type: status_code
status_code: ERROR
- name: high-latency
type: latency
latency:
threshold_ms: 500
Preventing Memory Leaks and Performance Degradation in Instrumented Applications
Problem: Your application experiences increasing memory usage or performance degradation after adding OpenTelemetry.
Troubleshooting Steps:
- Monitor JVM metrics: Track heap usage, garbage collection patterns, and thread counts.
- Profile the application: Use tools like VisualVM or YourKit to identify memory-intensive components.
- Check batch processing: Ensure spans are being exported efficiently.
Solution:
# Configure more efficient batching
otel.bsp.schedule.delay=5000
otel.bsp.max.queue.size=2048
otel.bsp.max.export.batch.size=512
otel.bsp.export.timeout=30000
Optimizing your custom instrumentation:
// Use appropriate span lifetime management
Span span = tracer.spanBuilder("operation")
.setStartTimestamp(startTime, TimeUnit.MILLISECONDS)
.startSpan();
try {
// Operation logic
} finally {
span.end(); // Always end spans to prevent leaks
}
Evaluating Performance Impact and Resource Requirements for Spring OpenTelemetry
Understanding the performance impact of OpenTelemetry helps in production planning.
Aspect | Typical Impact | Mitigation Strategies |
---|---|---|
CPU Overhead | 3-8% increase | • Optimize sampling rates<br>• Use efficient batching<br>• Apply filtering at collector |
Memory Usage | 10-15% increase | • Configure appropriate buffer sizes<br>• Monitor and adjust GC settings<br>• Use memory limiters in collector |
Network I/O | Additional 5-10KB per request | • Compress telemetry data<br>• Implement intelligent sampling<br>• Use local collectors to batch data |
Disk I/O | Minimal for app, significant for storage | • Implement data retention policies<br>• Use time-series optimized storage<br>• Consider hot/cold storage strategies |
Latency Addition | 1-5ms per request | • Use asynchronous exporters<br>• Optimize collector performance<br>• Consider tail-based sampling |
Recommended Best Practices for Production Deployment of Spring OpenTelemetry
Based on experience with numerous production deployments, here are recommended practices:
- Start with a phased rollout:
- Begin with non-critical services
- Gradually increase sampling rates
- Monitor impact before full deployment
- Implement a proper sampling strategy:
- Use head-based sampling for high-volume services
- Consider tail-based sampling at the collector for error detection
- Maintain 100% sampling for critical business workflows
- Optimize for resource efficiency:
- Configure appropriate batch sizes and intervals
- Use memory limiters to prevent OOM conditions
- Implement circuit breakers for telemetry pipelines
- Design for observability data governance:
- Establish naming conventions for services, metrics, and traces
- Define data retention policies
- Control access to sensitive information in spans
- Create useful visualizations and alerts:
- Build dashboards that show service health
- Create alerts based on SLO/SLI metrics
- Incorporate business context in your visualizations
Conclusion
Spring OpenTelemetry provides powerful observability capabilities for your Spring applications. With proper implementation and configuration, you can gain deeper insights into your application behavior, improve troubleshooting efficiency, and enhance overall system reliability.