OpenTelemetry Context Propagation for Better Tracing

In distributed systems, context propagation is key to tracking requests across services. With microservices, it’s important to maintain visibility into how requests flow through the system.

OpenTelemetry offers a framework to help trace these requests, ensuring you can follow their journey.

In this blog, we’ll look at how OpenTelemetry context propagation works, why it matters, and how to implement it in your system.

The Role of Context Propagation in Observability

Context propagation is the backbone of tracing in distributed systems. It enables the passing of relevant information across service boundaries, ensuring that telemetry data (like traces, spans, and logs) maintains its connection throughout the entire request lifecycle.

This allows engineers to track a single request across multiple services, providing visibility into how each service impacts the overall system performance.

When a request enters your system, OpenTelemetry attaches a context to it. This context contains key information, such as trace IDs and span IDs, which help correlate logs and traces between services.

As the request flows from one service to another, OpenTelemetry ensures that the context travels with it, thus enabling you to trace every step of the request’s journey.

gRPC with OpenTelemetry: Observability Guide for Microservices

How OpenTelemetry Context Propagation Works

To understand how context propagation works, let’s break it down step by step:

1. Context Initialization

When a request first enters your system, OpenTelemetry generates a new trace context (often containing a trace ID) if it doesn’t already exist. This context acts as a unique identifier for the lifecycle of that request.

2. Context Injection

As the request moves from one service to another, OpenTelemetry automatically injects this trace context into the outgoing request headers. This injection is key to maintaining continuity in tracing.

3. Context Extraction

Upon receiving the request, the downstream service extracts the trace context from the headers. This allows it to continue the trace from where the previous service left off.

4. Trace Continuity

Each service processes the request and creates a new span, which is tied to the trace context. This span represents the work done by that service. The context propagation ensures that spans are linked correctly, providing a clear trace from start to finish.

5. End-to-End Visibility

At the end of the trace, the data from all spans are sent to your observability platform, where they are aggregated and visualized. You can now see the complete journey of the request, including any latencies or bottlenecks that may have occurred.

The Challenges of Context Propagation

While context propagation is powerful, it’s not without its challenges. One of the main hurdles is ensuring consistency and reliability across different services, especially in a microservices environment. Here are some common issues:

1. Service Boundaries

Context must be propagated across service boundaries, which can involve different programming languages, frameworks, or even transport protocols. Managing context across these boundaries can sometimes be tricky and error-prone.

2. Context Overhead

Passing context along with every request can introduce a slight overhead. For high-volume systems, this can add up, affecting performance. However, the tradeoff is often worth it for the visibility and traceability context propagation provides.

3. Correlating Logs and Traces

Sometimes, the logs generated by different services may not have the correct trace context attached, making it difficult to correlate them with the appropriate traces. Ensuring that context is passed along with all logs is essential for full observability.

A Complete Guide to Integrating OpenTelemetry with FastAPI

Manual Context Propagation: When You Need More Control

While OpenTelemetry offers automatic context propagation for most common use cases, there are situations where you may need to manually control how context is passed through your system.

In manual context propagation, you explicitly manage the context (such as trace and span IDs) within your application. This might involve setting or extracting the context by hand, particularly in situations where automatic context propagation isn’t sufficient or feasible.

OpenTelemetry provides the necessary APIs to make this manual handling possible, allowing you to take full control over what’s included in the context and how it’s passed between services.

When to Use Manual Context Propagation

There are several scenarios where you might need to use manual context propagation:

Custom Protocols or Message Formats: If you’re using a custom protocol or message format that OpenTelemetry’s automatic propagation doesn’t support out of the box, you’ll need to manually handle context propagation.
Non-HTTP Communication: For services communicating over non-HTTP protocols (e.g., gRPC, AMQP, or custom TCP sockets), automatic context propagation might not be sufficient, and you’ll have to inject and extract the context yourself.
Cross-Service Tracing with Specific Needs: In complex systems where you need to ensure that specific context, like additional metadata, is passed between services, manual control allows you to add these details to the context explicitly.
Legacy Systems: If you’re working with legacy systems that don’t yet support OpenTelemetry’s automatic propagation, you may need to manually manage the context in the codebase.

Let’s look at how you can manually propagate context using OpenTelemetry APIs, focusing on a typical microservices scenario where a service sends a message to a queue and another service reads it.

Kubernetes Observability with OpenTelemetry Operator

Scenario: Passing Context Between Services Using Custom Headers

Suppose you have a service that sends a request to another service through a message broker like Kafka. Since Kafka doesn’t automatically propagate OpenTelemetry context, you’ll need to inject the context manually.

Here’s an example in Python using OpenTelemetry’s API to manually propagate context:

Sending the Message with Injected Context

from opentelemetry import trace
from opentelemetry.propagators.textmap import set_span_in_context
from opentelemetry.trace import set_span_in_context

# Assume we have a span in the current context
current_span = trace.get_current_span()

# Creating a custom message to send to another service
message = {
    "headers": {},
    "payload": "Request data"
}

# Injecting context into the message headers
set_span_in_context(current_span, message['headers'])

# Now send the message (using Kafka, for example)
send_to_kafka(message)

Receiving the Message and Extracting Context

On the receiving service side, we need to extract the context from the message headers:

from opentelemetry.propagators.textmap import get_span_from_context

# Assuming we received a message from Kafka
received_message = receive_from_kafka()

# Extract context from the message headers
extracted_context = get_span_from_context(received_message['headers'])

# Set the extracted context as the current context
set_span_in_context(extracted_context)

# Continue processing the request
process_request(received_message['payload'])

In this example, we manually inject the trace context into the message headers before sending it. On the receiving end, we extract the context from the headers and continue the trace with the relevant context.

Differences Between Automatic and Manual Context Propagation

Here are the key differences between the two:

Aspect	Automatic Context Propagation	Manual Context Propagation
Ease of Use	Simple and requires minimal configuration. OpenTelemetry handles context injection and extraction automatically.	Requires more effort, as you need to manually inject and extract context.
Flexibility	Limited to supported protocols and communication patterns.	High flexibility, allows custom protocols, headers, and messaging systems.
Control	Less control over what context is included. Context is automatically managed based on trace configuration.	Full control over what context is passed and how it’s injected, allowing for tailored implementations.
Error Handling	OpenTelemetry handles most edge cases automatically.	You are responsible for handling edge cases, like missing or corrupted context.
Use Case	Works well for common protocols like HTTP(S), gRPC, etc.	Needed when dealing with non-standard communication patterns or legacy systems.

OTEL Collector Monitoring: Best Practices & Guide

Why Choose Manual Context Propagation?

While automatic context propagation works for most use cases, there are scenarios where manual control becomes necessary. These include:

1. Non-HTTP Communication

When your system uses non-HTTP protocols like WebSockets or custom TCP protocols, OpenTelemetry may not handle context propagation by default. In these cases, manual context propagation ensures that traces are still accurately captured.

2. Complex or Custom Workflows

If you need to pass additional custom metadata along with the trace context, such as user IDs or tenant information, you may need manual propagation to ensure everything is captured.

3. Compatibility with Legacy Systems

Legacy systems that don’t natively support OpenTelemetry may require manual context propagation to integrate with your observability platform.

How Context Propagation Enhances Distributed Tracing

Context propagation isn’t just for OpenTelemetry. Many other distributed tracing systems also rely on similar mechanisms to propagate trace contexts across services.

However, OpenTelemetry stands out by providing a standardized and extensible approach that can be easily integrated with existing observability tools.

OpenTelemetry's ability to propagate context enables you to track requests across various platforms and systems, giving you comprehensive visibility into the performance of your application.

With context in place, you can quickly detect anomalies, track down performance bottlenecks, and improve overall system reliability.

How to Set Up and Manage Propagators in OpenTelemetry

In OpenTelemetry, propagators are crucial for passing trace context (such as trace IDs and span IDs) across service boundaries. They define how context is injected into and extracted from requests as they move through your system.

What Are Propagators?

Propagators are responsible for encoding and decoding trace context when it's passed between services, often through HTTP headers, message queues, or other communication protocols.

OpenTelemetry supports different types of propagators depending on your use case, including W3C Trace Context and B3.

These formats are commonly used to represent trace context, but OpenTelemetry also allows for custom propagators to meet your specific needs.

Types of Propagators in OpenTelemetry

1. W3C Trace Context Propagator

The W3C Trace Context is a standard format used to propagate trace context across distributed systems. OpenTelemetry natively supports this format, commonly used in HTTP-based communications. This propagator works with headers like traceparent and tracestate.

2. B3 Propagator

B3 is a popular trace context format used by systems like Zipkin. OpenTelemetry supports the B3 propagator, which uses headers like X-B3-TraceId, X-B3-SpanId, and X-B3-Sampled for context propagation.

3. Custom Propagators

If your system uses a non-standard or proprietary communication protocol, OpenTelemetry allows you to create and configure custom propagators. This flexibility helps accommodate unique needs and use cases.

How to Configure Propagators in OpenTelemetry

Configuring propagators in OpenTelemetry is typically done through the Propagators class in the OpenTelemetry API. Below, we'll go through how to configure different propagators for your application.

1. Setting W3C Trace Context Propagator

By default, OpenTelemetry uses the W3C Trace Context propagator. However, you can explicitly configure it as follows:

from opentelemetry import propagators
from opentelemetry.propagators.tracecontext import TraceContextTextMapPropagator

# Set the W3C Trace Context propagator
propagators.set_global_textmap(TraceContextTextMapPropagator())

With this configuration, any traces sent through your system will use the W3C trace context format.

2. Setting B3 Propagator

If your system or a downstream service relies on B3 headers (commonly used with systems like Zipkin), you can configure OpenTelemetry to use the B3 propagator:

from opentelemetry.propagators.b3 import B3Propagator

# Set the B3 propagator
propagators.set_global_textmap(B3Propagator())

OpenTelemetry Filelog Receiver: Collecting Kubernetes Logs

3. Using Multiple Propagators

In some cases, you may need to use multiple propagators. This is common when different parts of your system need different formats. OpenTelemetry allows you to combine multiple propagators into a composite one:

from opentelemetry.propagators.composite import CompositePropagator
from opentelemetry.propagators.tracecontext import TraceContextTextMapPropagator
from opentelemetry.propagators.b3 import B3Propagator

# Combine W3C and B3 propagators
composite_propagator = CompositePropagator(
    TraceContextTextMapPropagator(),
    B3Propagator(),
)

# Set the composite propagator globally
propagators.set_global_textmap(composite_propagator)

In this setup, OpenTelemetry will attempt to extract context using both W3C and B3 formats, propagating it accordingly. This is useful when integrating with different tracing backends or protocols.

4. Creating Custom Propagators

For specialized use cases, you can create a custom propagator. Custom propagators are helpful when you're using a non-standard header format or need special behavior for context injection or extraction.

Here's how to create a basic custom propagator in Python:

from opentelemetry.propagators import TextMapPropagator
from opentelemetry.context import get_current

class CustomPropagator(TextMapPropagator):
    def inject(self, carrier, context):
        # Custom logic to inject context into the carrier
        trace_context = get_current()
        carrier['X-Custom-Trace-Id'] = trace_context.trace_id
    
    def extract(self, carrier):
        # Custom logic to extract context from the carrier
        trace_id = carrier.get('X-Custom-Trace-Id')
        return trace_id

# Set the custom propagator
propagators.set_global_textmap(CustomPropagator())

In this example, the CustomPropagator class implements the inject and extract methods. The inject method adds the trace context to a custom header, while the extract method pulls it out when needed.

When to Use Custom Propagators

Creating custom propagators is especially useful when:

You have a unique communication protocol not covered by standard propagators.
You need to add additional metadata to the trace context, such as user IDs, geographic information, or application-specific tags.
You are integrating with legacy systems that don’t use standard tracing formats.

Instrumenting Golang Apps with OpenTelemetry

Advanced Considerations for OpenTelemetry Context Propagation

For advanced users, here are some additional considerations when working with OpenTelemetry context propagation:

1. Multi-tenant Architectures

If you’re operating in a multi-tenant environment, you’ll need to ensure that the context propagation does not mix trace data from different tenants. This may involve using custom trace IDs or adding tenant-specific metadata to the context.

2. Distributed Context Propagation

In some architectures, especially those with complex communication patterns, it may be necessary to propagate context through multiple layers of abstraction. This might include not just microservices but also message queues, serverless functions, or external APIs.

3. Context Propagation with Non-HTTP Protocols

While HTTP(S) is the most common protocol for context propagation, OpenTelemetry supports other communication protocols such as gRPC, Kafka, and more. You’ll need to ensure that context propagation is handled correctly for each protocol your services use.

4. Sampling and Context

OpenTelemetry allows you to apply sampling rules to trace data. It’s important to ensure that sampling decisions do not interfere with context propagation. In some cases, you might want to sample traces in a way that ensures critical traces are always captured.

Propagator Configuration in Distributed Systems

In distributed systems, ensuring that your propagators are configured consistently across all services is essential to maintaining trace integrity.

For example, if one service uses the W3C Trace Context propagator, all other services should follow suit to avoid losing trace data.

Additionally, in systems that employ various communication protocols (e.g., HTTP, gRPC, Kafka), you may need to configure multiple propagators. This can be done either by using a composite propagator or applying different ones depending on the service type.

Regardless of the setup, trace context must be correctly passed across service boundaries for complete observability.

OpenTelemetry with Flask: A Comprehensive Guide for Web Apps

Best Practices for OpenTelemetry Context Propagation

To ensure that you’re implementing OpenTelemetry context propagation effectively, follow these best practices:

1. Consistent Context Propagation Across All Services

Ensure that all services in your architecture, regardless of language or framework, propagate context consistently. OpenTelemetry supports a wide range of languages, so make sure you’re using the appropriate SDK for each service.

2. Ensure Proper Header Injection

When injecting the trace context into outgoing requests, make sure that the headers are formatted correctly and that the context is preserved throughout the entire request lifecycle.

3. Use OpenTelemetry SDKs and Libraries

OpenTelemetry provides several libraries and SDKs to make context propagation easier. Use these to simplify the implementation process and reduce the risk of errors.

4. Monitor Performance Impact

While context propagation is critical, it does introduce some overhead. Monitor the performance of your system to ensure that the added latency is within acceptable limits, especially in high-traffic environments.

5. Handle Missing or Corrupted Context Gracefully

Ensure that your services can handle situations where the trace context is missing or corrupted. This might occur if a request bypasses the normal flow or if a service fails to propagate the context correctly. You can log such errors and trigger alerts to identify and fix issues quickly.

6. End-to-End Testing

Implement end-to-end testing to ensure that context propagation works as expected across all services. Automated tests can help verify that the context is being passed correctly and that traces are complete.

With Last9, we eliminated the toil. It just works. — Matt Iselin, Head of SRE, Replit

Conclusion

Context propagation is a core component of distributed tracing and observability in modern applications. As systems continue to grow in complexity, mastering context propagation will be key to maintaining visibility and quickly identifying and resolving issues.

🤝

If you’d like to discuss further, feel free to join our community on Discord! We have a dedicated channel where you can connect with other developers and share insights on your specific use case.

FAQs

What is context propagation in OpenTelemetry?
Context propagation in OpenTelemetry refers to the process of passing trace context (like trace and span IDs) across service boundaries, enabling the tracking of requests as they move through a distributed system. It ensures that the trace remains intact and connected, providing full observability.

Why is context propagation important in distributed tracing?
Context propagation is crucial because it allows you to track and correlate requests across various microservices in a distributed architecture. Without proper propagation, trace data may be lost or disconnected, making it difficult to understand the journey of a request and identify performance bottlenecks or errors.

How does OpenTelemetry handle context propagation?
OpenTelemetry provides automatic context propagation through built-in propagators for common formats like W3C Trace Context and B3. It can also be configured to handle custom formats, ensuring that trace context is correctly injected and extracted from requests across different communication protocols like HTTP, gRPC, or message queues.

What are the different types of propagators in OpenTelemetry?
The main propagators in OpenTelemetry are the W3C Trace Context propagator, which is used for HTTP-based communications, and the B3 propagator, which is commonly used with Zipkin. You can also create custom propagators for non-standard protocols or additional metadata.

How do I configure propagators in OpenTelemetry?
You can configure propagators in OpenTelemetry by using the set_global_textmap() function to specify the type of propagator you want to use. Whether it’s W3C Trace Context, B3, or a custom propagator, this configuration ensures that trace context is consistently handled across your services.

Can OpenTelemetry support multiple propagators?
Yes, OpenTelemetry allows you to use multiple propagators simultaneously by using a composite propagator. This is useful if your system requires support for different trace context formats depending on the communication protocol or service integration.

When should I use custom propagators in OpenTelemetry?
Custom propagators are needed when you’re dealing with unique or proprietary communication protocols that aren’t supported by the standard propagators. You can also use them if you need to inject additional metadata into the trace context or integrate with legacy systems that don't support standard trace formats.

How does context propagation work with message queues like Kafka?
When using message queues like Kafka, OpenTelemetry doesn’t automatically propagate trace context. You need to manually inject the trace context into message headers before sending, and then extract it on the receiving end to maintain trace continuity across services.