Consider an e-commerce site running a flash sale. Orders are pouring in, but some customers run into delays or failed payments. With services for authentication, inventory, and payments all working together, figuring out where things slow down isn’t simple.
Distributed tracing makes this easier. It shows how each request moves through your system so you can quickly spot bottlenecks, fix errors, and keep the checkout flow reliable.
OpenTelemetry has matured a lot in 2024–2025. Profiling support is now stable, zero-code instrumentation is available, and production-ready tooling exists across more than 12 languages. This guide walks through the basics and the advanced setups you’ll use in production.
Why OpenTelemetry for Distributed Tracing?
OpenTelemetry has become the go-to standard for distributed tracing, and it’s not hard to see why.
- Vendor neutrality. You’re not tied to a single vendor or backend. The same instrumentation can send data to Jaeger, Zipkin, Prometheus, or commercial platforms like Last9. If your needs change later, you don’t have to redo all the instrumentation—just point the data somewhere else.
- Mature and production-ready. By 2024, the core parts of OpenTelemetry—traces, metrics, and logs—are stable across most major languages. Java, .NET, Python, and Node.js already have hundreds of supported libraries and frameworks, so you don’t spend weeks wiring things up.
- Less friction with zero-code options. Thanks to eBPF-based auto-instrumentation, you can start capturing traces without touching application code. That means faster setup for new services and fewer changes in production environments where code modifications can be risky.
- Four signals, one standard. OpenTelemetry now brings traces, metrics, logs, and the newer profiling signal together. Having all four in a single framework makes it easier to connect the dots—linking slow traces with CPU hotspots or correlating logs with errors in real-time.
What’s New in OpenTelemetry
Profiling Signal: The Fourth Pillar
In March 2024, OpenTelemetry introduced profiling as its fourth signal, alongside traces, metrics, and logs. This opens up new ways to connect system behavior with code-level performance:
- Code-level insights. Go from a CPU spike in your metrics directly to the function consuming resources.
- Trace-to-profile correlation. See not only where latency occurs, but which code paths are responsible.
- Continuous profiling. Always-on performance monitoring with low overhead, giving you a steady stream of insights into application health.
Spring Boot Starter Now Stable
By September 2024, the OpenTelemetry Spring Boot Starter reached general availability. It gives Java developers more flexibility with features like:
- Native image support. Works with Spring Boot Native applications where the Java agent cannot.
- Configuration in-app. Use
application.properties
or YAML files instead of depending only on agent flags. - Lightweight setup. Lower startup overhead compared to the full Java agent.
eBPF Auto-Instrumentation
One of the biggest shifts in OpenTelemetry is eBPF-based auto-instrumentation, which allows you to capture traces without changing your code. Key benefits include:
- Zero-code setup. No modifications to the application itself.
- Broad language coverage. Works across C/C++, Go, Rust, Python, Java, Node.js, .NET, PHP, and Ruby.
- Low resource cost. Typically under 1% CPU and about 250MB memory usage.
- Kernel-level visibility. Captures activity across system libraries and kernel calls that agents can’t see.
File-Based Configuration
OpenTelemetry now supports YAML and JSON configuration files. This makes it easier to manage complex setups without relying solely on environment variables or application code.
Distributed Tracing Fundamentals
Before setting up instrumentation, it helps to get comfortable with the core concepts behind distributed tracing.
Traces and Spans: The Building Blocks
- A trace is the full journey of a request as it moves through your system.
- A span is a single unit of work in that journey, such as a service call or a database query.
Each span records metadata like start time, end time, and relationships with other spans. Together, traces and spans let you follow a request end-to-end across microservices.
with tracer.start_as_current_span("process_order") as parent_span:
parent_span.set_attribute("order_id", "12345")
with tracer.start_as_current_span("validate_payment") as child_span:
child_span.set_attribute("payment_method", "credit_card")
# Payment validation logic
In this example, process_order
is the parent span, and validate_payment
is the child span is nested within it.
Span Relationships: Parent-Child vs. Links
- Parent-child spans form the traditional hierarchy where one span directly calls another.
- Span links are different—they connect spans across traces without requiring a strict hierarchy.
from opentelemetry.trace import Link, SpanContext
# Create a link to relate spans across different traces
source_span_context = SpanContext(trace_id=0x1, span_id=0x2, is_remote=True)
with tracer.start_as_current_span(
"process_batch_item",
links=[Link(context=source_span_context)]
) as span:
span.set_attribute("item_id", "12345")
# Processing logic
Span links are especially useful for:
- Fan-out operations that trigger multiple downstream calls
- Batch jobs where each item produces its own trace
- Async messaging where spans need correlation across message boundaries
Span Status: Operation Outcomes
Spans also capture the outcome of operations—whether they succeeded or failed.
from opentelemetry.trace.status import Status, StatusCode
with tracer.start_as_current_span("http_request") as span:
try:
response_code = make_request()
if response_code == 200:
span.set_status(Status(StatusCode.OK))
else:
span.set_status(Status(StatusCode.ERROR, description=f"HTTP {response_code}"))
except Exception as e:
span.set_status(Status(StatusCode.ERROR, description=str(e)))
This allows you to quickly spot failing spans in a trace and connect them back to errors in your system.
Zero-Code Instrumentation Options
OpenTelemetry has made big strides in reducing the effort it takes to instrument applications. You don’t always need to touch code or even restart services to start collecting traces.
Two of the most useful options are eBPF-based instrumentation and the new Spring Boot Starter for Java.
eBPF Auto-Instrumentation
eBPF runs inside the Linux kernel and observes system calls directly. That means you can capture requests, database calls, and system interactions without modifying your applications.
Why it matters:
- No restarts required — works on running processes
- Works with compiled binaries you can’t change
- Captures system-level activity, not just application calls
- Handles multiple languages side by side
Example:
# Start the OpenTelemetry eBPF profiler
sudo ./ebpf-profiler -collection-agent=127.0.0.1:11000 -disable-tls
This starts the profiler and streams collected data to a local OpenTelemetry Collector running on port 11000
. Because it runs at the kernel level, it can observe activity across all applications on the host.
Best suited for:
- Legacy apps that you can’t recompile
- Third-party binaries
- Mixed-language microservices environments
- Low-touch debugging in production
Java: Agent vs. Spring Boot Starter
For Java, you have two paths: the long-standing Java agent or the Spring Boot Starter, which became stable in late 2024.
Java Agent
java -javaagent:opentelemetry-javaagent.jar \
-Dotel.service.name=my-service \
-Dotel.exporter.otlp.endpoint=http://localhost:4317 \
-jar myapp.jar
Here, you attach the OpenTelemetry agent at startup with the -javaagent
flag. It automatically instruments common libraries and sends traces to your collector. This is the fastest way to get coverage, but it can add overhead during startup.
- Pros: Maximum coverage out of the box
- Cons: Heavier on startup time, less flexible to configure
Spring Boot Starter
<dependency>
<groupId>io.opentelemetry.instrumentation</groupId>
<artifactId>opentelemetry-spring-boot-starter</artifactId>
</dependency>
# application.yml configuration
otel:
service:
name: my-spring-service
exporter:
otlp:
endpoint: http://localhost:4317
instrumentation:
logback-appender:
enabled: true
Here, instrumentation is baked into your Spring Boot app itself. You manage it through the usual application.yml
or .properties
files, which gives you more flexibility and less overhead compared to the Java agent.
Best when:
- You’re building Spring Boot Native apps
- You prefer in-app config over JVM flags
- Startup performance matters
- You already rely on another agent in the same JVM
This way, you get both the “what to run” and the “why.”
Enhanced Traces with Attributes and Events
Distributed traces become much more useful when you enrich spans with metadata and contextual details. OpenTelemetry gives you two ways to do this: attributes and events.
Attributes: Contextual Metadata
Attributes attach static key–value metadata to spans. They’re useful for describing the request, user, or environment where the span is running.
with tracer.start_as_current_span("process_payment") as span:
# User context
span.set_attribute("user_id", "user_67890")
span.set_attribute("user_tier", "premium")
# Transaction details
span.set_attribute("order_id", "12345")
span.set_attribute("payment_method", "credit_card")
span.set_attribute("amount", 99.99)
span.set_attribute("currency", "USD")
# Environment info
span.set_attribute("host.name", "payment-service-1")
span.set_attribute("deployment.environment", "production")
Here, the span for process_payment
is enriched with user info, order details, and environment context. Later, you can filter traces by attributes—for example, finding only failed payments from premium users in production.
Events: Dynamic Timestamped Data
While attributes describe “what” a span is about, events capture “what happened” during the span’s lifetime. They’re timestamped markers you can attach to record key moments.
with tracer.start_as_current_span("process_payment") as span:
span.add_event("payment_initiated", {
"timestamp": "2024-12-26T12:00:00Z",
"gateway": "stripe"
})
try:
result = process_payment()
span.add_event("payment_completed", {
"transaction_id": result.transaction_id,
"processing_time_ms": result.duration
})
except PaymentException as e:
span.add_event("payment_failed", {
"error_code": e.code,
"retry_count": e.retry_count,
"error_message": str(e)
})
In this example, the span records three possible events: when the payment starts, when it completes, or if it fails. This helps you later replay the lifecycle of a request, correlate failures, and understand latency spikes with much finer detail than attributes alone.
How Context Propagation Works
Distributed tracing only works if requests can be followed across service boundaries. That’s what context propagation does: it carries trace metadata (like trace IDs and span IDs) along with each request, so new spans can be tied back to the same trace.
How It Works
- Injection – The upstream service attaches trace context to request headers.
- Extraction – The downstream service reads that context from incoming headers.
- Continuation – Any new spans created downstream are linked back to the original trace.
This allows a single trace to flow across microservices, queues, or even language boundaries.
Example: Service A Sending a Request
from opentelemetry.propagate import inject
import requests
with tracer.start_as_current_span("service_a_operation") as span:
headers = {}
inject(headers) # Injects trace context
response = requests.get(
"http://service-b.example.com/api",
headers=headers
)
Here, Service A creates a span and injects the trace context into HTTP headers before requesting Service B. That way, Service B knows this request belongs to the same trace.
Example: Service B Receiving a Request
from opentelemetry.propagate import extract
from flask import Flask, request
@app.route("/api")
def handle_request():
context = extract(request.headers) # Extracts trace context
with tracer.start_as_current_span("service_b_operation", context=context) as span:
span.set_attribute("received_from", "service_a")
return process_request()
Service B extracts the context from incoming headers and uses it to create a new span. That span is automatically linked to Service A’s trace, giving you continuity across the two services.
Propagation Formats
OpenTelemetry supports multiple propagation formats so it can integrate with existing systems:
B3 Propagation (used by Zipkin):
X-B3-TraceId: 4bf92f3577b34da6a3ce929d0e0e4736
X-B3-SpanId: 00f067aa0ba902b7
X-B3-Sampled: 1
W3C Trace Context (default):
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
With propagation in place, every hop of a request is stitched into one coherent trace, making distributed tracing possible at scale.
Step-by-Step Implementation Guide
Here’s a structured approach that takes you from instrumentation to backend configuration, with examples you can adapt to your stack.
1. Choose Your Instrumentation Approach
OpenTelemetry offers three main ways to instrument your applications:
Manual Instrumentation – Full Control, Custom Spans
Install the required packages:
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
This approach gives you maximum flexibility. You explicitly define spans, attributes, and events in your code—ideal when you want fine-grained control.
Auto-Instrumentation – Quick Start with Libraries
Install and bootstrap:
pip install opentelemetry-distro[otlp]
opentelemetry-bootstrap -a install
Auto-instrumentation hooks into supported libraries automatically (HTTP clients, SQL drivers, etc.), so you get visibility fast without changing much code.
Zero-Code Instrumentation – eBPF for Existing Applications
For environments where touching code isn’t possible, eBPF provides zero-code tracing.
# No application changes required
sudo ./ebpf-profiler -collection-agent=localhost:11000
Because eBPF operates at the kernel level, it works with compiled binaries and across multiple languages simultaneously. Perfect for legacy apps or third-party binaries.
2. Configure the SDK
Once you’ve chosen an instrumentation approach, configure the SDK to define resources, exporters, and processors.
Python Example
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
# Configure tracer provider with resource attributes
trace.set_tracer_provider(TracerProvider(
resource=Resource.create({
"service.name": "my-service",
"service.version": "1.0.0",
"deployment.environment": "production"
})
))
# Configure OTLP exporter
otlp_exporter = OTLPSpanExporter(
endpoint="http://localhost:4317",
headers={
"authorization": "Bearer your-token-here"
}
)
# Add batch processor for efficient export
span_processor = BatchSpanProcessor(otlp_exporter)
trace.get_tracer_provider().add_span_processor(span_processor)
tracer = trace.get_tracer(__name__)
This setup registers your service with metadata (name, version, environment), sends spans to an OTLP endpoint, and batches them for performance.
File-Based Configuration (New in 2024)
You can now define OpenTelemetry configuration in YAML or JSON, making it easier to manage in containerized or multi-service environments.
# otel-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
send_batch_size: 1024
exporters:
otlp:
endpoint: "http://your-backend:4317"
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
This configuration creates a trace pipeline: receive OTLP data, batch it, and send it to your chosen backend.
3. Add Manual Instrumentation Where Needed
Even if you use auto-instrumentation, there are times when custom spans add valuable context.
HTTP Service Instrumentation Example
@app.route("/users/<user_id>")
def get_user(user_id):
with tracer.start_as_current_span("get_user") as span:
# Add context
span.set_attribute("user.id", user_id)
span.set_attribute("http.method", request.method)
span.set_attribute("http.url", request.url)
# Add event for request start
span.add_event("request_started")
try:
# Database operation
with tracer.start_as_current_span("db_query") as db_span:
db_span.set_attribute("db.statement", f"SELECT * FROM users WHERE id = {user_id}")
db_span.set_attribute("db.name", "userdb")
user = db.get_user(user_id)
if user:
span.set_attribute("user.found", True)
span.add_event("user_retrieved", {"user_id": user.id})
return jsonify(user.to_dict())
else:
span.set_status(Status(StatusCode.ERROR, "User not found"))
return jsonify({"error": "User not found"}), 404
except Exception as e:
span.record_exception(e)
span.set_status(Status(StatusCode.ERROR, str(e)))
return jsonify({"error": "Internal server error"}), 500
Here the get_user
span captures request attributes, while a nested db_query
span records the database statement. Events log key milestones like when the request starts or when a user is retrieved.
4. Configure Your Backend
Instrumentation data is only useful if it’s stored and visualized. You can export traces to open-source backends like Jaeger or to commercial platforms like Last9.
Using Jaeger v2 (with Built-in OTLP)
docker run -d --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
jaegertracing/jaeger:2.5.0
This runs Jaeger with OTLP gRPC and HTTP receivers enabled, so your OpenTelemetry SDKs can export directly.
OpenTelemetry Collector with Multi-Backend Export
# collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
send_batch_size: 1024
# Filter sensitive data
attributes:
actions:
- key: user.email
action: delete
- key: user.ssn
action: delete
exporters:
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
# Export to Last9
otlp/last9:
endpoint: "https://otlp.last9.io"
headers:
authorization: "Bearer YOUR_LAST9_TOKEN"
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, attributes]
exporters: [jaeger, otlp/last9]
This collector setup does three things:
- Receives spans over OTLP (gRPC and HTTP).
- Processes data with batching and attribute filtering (to drop sensitive fields).
- Exports traces both to Jaeger for visualization and to Last9 for high-cardinality, long-term storage and analysis.
Correlate Traces with Profiles
With profiling now part of OpenTelemetry, you can directly connect spans in traces with code-level performance data. This makes it easier to move from “a service is slow” to “this function is the bottleneck.”
Metrics to Profiles Correlation
When metrics show a CPU spike, profiling data helps you pinpoint the exact code path responsible.
# When you see a CPU spike in metrics, jump to profiling data
with tracer.start_as_current_span("cpu_intensive_operation") as span:
span.set_attribute("operation.type", "data_processing")
# This span can now be correlated with profiling data
result = heavy_computation()
span.set_attribute("records.processed", len(result))
Here the cpu_intensive_operation
span is tagged with attributes that can be tied to profiling samples, showing which part of the code consumed CPU.
Traces to Profiles Correlation
Profiling also helps when investigating latency in traces.
# Slow trace spans can be correlated with profiling data
with tracer.start_as_current_span("slow_operation") as span:
start_time = time.time()
result = complex_algorithm()
duration = time.time() - start_time
span.set_attribute("operation.duration_ms", duration * 1000)
if duration > 1.0: # Slow operation
span.add_event("performance_investigation_needed", {
"duration_threshold_exceeded": True,
"profile_correlation_available": True
})
In this example, a slow span triggers an event that signals you can jump into profiling data to understand which functions caused the slowdown.
Best Practices for Production
Adding traces and profiles together is powerful, but to keep it efficient and secure in production, a few practices help.
1. Smart Sampling Strategies
Sampling ensures you capture the right traces without overwhelming your backend.
Head-based sampling (SDK level):
from opentelemetry.sdk.trace.sampling import ParentBased, TraceIdRatioBased
# Sample 10% of traces, but always sample if parent was sampled
sampler = ParentBased(TraceIdRatioBased(0.1))
trace.set_tracer_provider(TracerProvider(sampler=sampler))
This samples a percentage of traces at the point of creation, with parent traces always preserved.
Tail-based sampling (Collector level):
processors:
tail_sampling:
policies:
# Always sample errors
- name: error-sampling
type: status_code
status_code:
status_codes: [ERROR]
# Sample slow requests
- name: latency-sampling
type: latency
latency:
threshold_ms: 1000
# Sample 1% of normal traffic
- name: probabilistic-sampling
type: probabilistic
probabilistic:
sampling_percentage: 1
This approach makes sampling decisions after spans are collected—ideal for always keeping errors and slow requests.
2. Security and Data Handling
Sensitive attributes must never leak into telemetry. You can sanitize them in code or enforce redaction at the collector.
In-code sanitization:
# Mask sensitive attributes
def sanitize_span_attributes(span, **attributes):
for key, value in attributes.items():
if key in ["password", "ssn", "credit_card", "token"]:
span.set_attribute(key, "***REDACTED***")
elif "email" in key.lower():
span.set_attribute(key, mask_email(value))
else:
span.set_attribute(key, value)
# Usage
with tracer.start_as_current_span("user_operation") as span:
sanitize_span_attributes(span,
user_id="12345",
user_email="user@example.com", # Will be masked
password="secret123" # Will be redacted
)
Collector-level redaction:
processors:
redaction:
allow_all_keys: false
blocked_fields:
- "user.password"
- "user.ssn"
- "credit_card.number"
summary: "debug"
This ensures sensitive fields are removed before leaving your environment.
3. Performance Optimization
Keep tracing overhead low with selective instrumentation and batch tuning.
Conditional span creation:
# Use conditional instrumentation for high-frequency operations
class OptimizedTracer:
def __init__(self, tracer):
self.tracer = tracer
def start_span_if_sampled(self, name, **kwargs):
if self.should_sample():
return self.tracer.start_as_current_span(name, **kwargs)
return contextlib.nullcontext()
def should_sample(self):
current_span = trace.get_current_span()
return current_span.get_span_context().trace_flags.sampled
This avoids unnecessary spans when sampling isn’t enabled.
Batch processor tuning:
# Optimize batch processor for your throughput
span_processor = BatchSpanProcessor(
otlp_exporter,
max_queue_size=2048, # Increase for high-throughput
export_timeout_millis=5000, # Timeout for export operations
schedule_delay_millis=1000, # Batch export frequency
max_export_batch_size=512 # Balance between latency and efficiency
)
Adjust these values to balance throughput and latency for your workload.
4. Monitoring OpenTelemetry Itself
Instrumentation should also be observable. Starting in 2024, SDK self-metrics help track its own performance.
# Monitor your instrumentation performance
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import ConsoleMetricExporter
meter_provider = MeterProvider()
meter = meter_provider.get_meter("otel-sdk-metrics")
# Track span creation rate
span_counter = meter.create_counter(
"otel_spans_created_total",
description="Total number of spans created"
)
# Track export failures
export_error_counter = meter.create_counter(
"otel_export_errors_total",
description="Total export errors"
)
These metrics let you see if instrumentation itself is adding overhead or if exports are failing.
Troubleshoot Common Issues
OpenTelemetry setups sometimes don’t behave as expected. Most problems fall into a handful of patterns—propagation gaps, misconfigured exporters, or resource bottlenecks. Let’s look at a few of the usual suspects and how to fix them.
Traces Don’t Connect Across Services
You see spans in your backend, but they aren’t stitched together into a full trace. This usually points to a propagation issue: the upstream service isn’t injecting headers, or the downstream service isn’t extracting them. Sometimes it’s as simple as one team using W3C Trace Context while another is still on B3.
- Verify that every service both injects outgoing headers and extracts incoming ones.
- Standardize on one propagation format—W3C Trace Context is the OpenTelemetry default and works well across most stacks.
Performance Overhead After Enabling Tracing
It’s not uncommon to notice slower startup times or higher CPU usage once tracing is turned on. The usual culprits are too many spans being created for high-frequency operations, or a batch processor that isn’t tuned for your workload.
- Look at hot paths like loops or high-volume endpoints—are you creating a span on every iteration?
- Use sampling to cut down on unnecessary spans, and adjust batch settings like
max_queue_size
orschedule_delay_millis
to better match your throughput.
Data Not Showing Up in the Backend
Sometimes everything looks fine locally, but nothing lands in Jaeger, Last9, or whichever backend you’re using. In most cases, the exporter is misconfigured. Either the endpoint is wrong, TLS is blocking the connection, or an auth header is missing.
- Try sending a request directly to the OTLP endpoint with
curl
to confirm it’s reachable. - Double-check exporter URLs, ports, and tokens. Small typos—like
http://
instead ofhttps://
—are a surprisingly common cause.
Sampling Feels Off
You either end up with far too much data or not nearly enough. This usually happens when head-based sampling ratios are set too aggressively or tail-sampling rules are too strict.
- Review your policies—are you only sampling errors and nothing else?
- Start simple: always keep errors and slow traces, then add probabilistic rules (say 1–5% of normal traffic). Tune gradually instead of big swings.
Sensitive Data Slips Into Spans
Tracing is powerful, but you don’t want emails, tokens, or card numbers showing up in attributes. This happens if spans set attributes directly without sanitization, or if your collector isn’t filtering.
- Scan a few traces—do you see PII in attributes?
- Add sanitization in code for common fields, and back it up with collector processors that drop or mask sensitive attributes before export.
Collector Struggles Under Load
When traffic ramps up, the collector itself can become a bottleneck. You might notice dropped spans, backpressure, or even crashes.
- Check Collector logs and self-metrics (
otelcol_exporter_queue_size
is a good one). - Increase queue sizes, raise batch limits, or shard workloads across multiple collectors. For heavy setups, running collectors close to services (sidecars or node agents) can help distribute load.
Most of these problems trace back to either config mismatches or scale tuning. Start small: confirm propagation, check exporter connectivity, and then dig into batch and sampling settings. The good news is that once you sort out these basics, OpenTelemetry tends to be stable and predictable in production.
Final Thoughts
We’ve walked through how tracing works end-to-end: spans and attributes to capture context, events to record milestones, propagation to connect services, and profiles to tie performance issues back to code. Together, these signals give you a detailed picture of how your system behaves in production.
The real question is what happens next. Once you start collecting rich telemetry, you need a place to explore it without losing fidelity or fighting query slowdowns. That’s where platforms like Last9 help.
- Every attribute you set stays searchable, even during cardinality spikes.
- Queries stay fast, whether you’re debugging a single payment failure or correlating across billions of spans.
- Profiles, traces, logs, and metrics all live in one place, so you can move seamlessly from “this request was slow” to “this function burned CPU.”
- With event-based pricing, you pay for the telemetry you send—not for hosts or user seats.
If you’ve been experimenting with OpenTelemetry, sending some of that data into Last9 alongside your existing setup is a straightforward next step, and one that shows the real value of keeping all that context intact.
Start for free today!
FAQs
What is OpenTelemetry distributed tracing?
OpenTelemetry distributed tracing is a method of tracking and visualizing the journey of a request as it moves through various services in a distributed system. It provides insights into how different components interact and helps identify performance bottlenecks or errors.
What's the difference between OpenTelemetry and Jaeger?
OpenTelemetry is a framework for collecting and exporting telemetry data, while Jaeger is a backend for storing and visualizing traces. OpenTelemetry sends data to Jaeger (among other backends).
Should I use eBPF or manual instrumentation?
- eBPF: Best for getting started quickly, legacy applications, or when you can't modify code
- Manual: Best for fine-grained control, custom business logic, and maximum observability depth
How does the new profiling signal work?
Profiling adds continuous performance monitoring that can be correlated with traces, metrics, and logs. You can jump from a slow trace span directly to the profiling data showing which code is consuming resources.
Is the Spring Boot Starter production-ready?
Yes, as of September 2024, the OpenTelemetry Spring Boot Starter is stable and production-ready, offering a lightweight alternative to the Java agent.
What's the performance impact of distributed tracing?
- eBPF auto-instrumentation: <1% CPU, ~250MB memory
- Manual instrumentation: 2-5% CPU overhead
- Java agent: 3-8% CPU, can be higher during startup
Can I use OpenTelemetry with my existing monitoring tools?
Yes, OpenTelemetry is vendor-neutral and can export to virtually any observability backend, including Prometheus, Grafana, Datadog, New Relic, and more.
How do I handle sensitive data in traces?
Use attribute processors to redact sensitive data, implement span-level sanitization, and configure your collector to filter or mask sensitive information before export.
What languages have the best OpenTelemetry support?
Java, .NET, Python, and Node.js have the most mature support with hundreds of auto-instrumentation libraries. Go and Rust support is rapidly improving with eBPF-based solutions.