When working with OpenTelemetry, environment variables play a crucial role in configuring and customizing your setup. These variables provide a flexible and convenient way to adjust settings without needing to change code, allowing you to fine-tune your OpenTelemetry installation across different environments.
In this blog, we’ll look into OpenTelemetry environment variables, offering you insights on their use, best practices, and some lesser-known variables that can help you optimize your monitoring setup.
What Are OpenTelemetry Environment Variables?
OpenTelemetry provides a set of tools for collecting, processing, and exporting telemetry data (like traces, metrics, and logs) from your application.
Environment variables are a convenient method for passing configuration details to OpenTelemetry’s various components—such as the OpenTelemetry SDK or the OpenTelemetry Collector—without modifying your application’s source code.
These variables help manage aspects like exporter configuration, sampling, and service name, making it easier to adapt your observability setup to different environments (e.g., development, staging, production) without the need for constant code changes.
Key OpenTelemetry Environment Variables
1. OTEL_RESOURCE_ATTRIBUTES
This variable is used to define additional resource attributes that you want to associate with the telemetry data. Resource attributes give context to the telemetry data by providing metadata like the name of the service or the version of the application.
For example, to specify the service name and version:
export OTEL_RESOURCE_ATTRIBUTES=service.name=my-service,service.version=1.0.0
2. OTEL_EXPORTER_ZIPKIN_ENDPOINT
If you’re using Zipkin as your trace exporter, this variable is used to define the endpoint where the traces should be sent. It's a critical variable for ensuring that your OpenTelemetry data reaches the correct destination.
Example usage:
export OTEL_EXPORTER_ZIPKIN_ENDPOINT=http://zipkin-server:9411/api/v2/spans
data:image/s3,"s3://crabby-images/3c380/3c3803862b52636c7b1cd1da658e57476872ceb8" alt=""
3. OTEL_EXPORTER_OTLP_ENDPOINT
The OTEL_EXPORTER_OTLP_ENDPOINT variable configures the OpenTelemetry protocol (OTLP) exporter. OTLP is one of the most commonly used protocols for sending telemetry data, and this variable specifies where the data should be sent.
Example usage:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
4. OTEL_SAMPLER
OpenTelemetry allows you to control sampling behavior, and the OTEL_SAMPLER environment variable helps you configure the sampling strategy. For example, setting the value to parentbased_always_on
ensures that traces are always sampled.
Example usage:
export OTEL_SAMPLER=parentbased_always_on
5. OTEL_EXPORTER_JAEGER_AGENT_HOST
For those using Jaeger for tracing, this variable specifies the Jaeger agent’s hostname. It is essential for sending trace data from OpenTelemetry to a Jaeger backend.
Example usage:
export OTEL_EXPORTER_JAEGER_AGENT_HOST=jaeger-agent
6. OTEL_METRICS_EXPORTER
This variable configures the exporter for metrics data. Whether you are using Prometheus, StatsD, or another exporter, this environment variable allows you to specify the type of metrics exporter you want OpenTelemetry to use.
Example usage:
export OTEL_METRICS_EXPORTER=prometheus
Advanced OpenTelemetry Environment Variables
While many OpenTelemetry users are familiar with the common environment variables mentioned above, some lesser-known variables can add more granularity and control to your setup. These often fly under the radar but can make a big difference in optimizing your environment.
1. OTEL_TRACES_SAMPLER_ARGUMENTS
This environment variable provides additional control over the sampling rate. It allows you to set parameters such as probability (for probabilistic sampling) or rate limits for trace sampling, offering a more advanced configuration than just the basic sampler settings.
Example usage:
export OTEL_TRACES_SAMPLER_ARGUMENTS=probability=0.5
2. OTEL_EXPORTER_OTLP_INSECURE
When working with insecure environments or for testing purposes, this variable disables TLS for OTLP. Be cautious with this in production environments, as it reduces the security of the telemetry data transmission.
Example usage:
export OTEL_EXPORTER_OTLP_INSECURE=true
3. OTEL_PROPAGATORS
This variable controls which context propagation formats OpenTelemetry should use when transmitting traces and context between services. You can configure it to use formats like tracecontext
(default) or baggage
.
Example usage:
export OTEL_PROPAGATORS=tracecontext,baggage
4. OTEL_PYTHON_LOGS_ENABLED
For Python users, this variable enables or disables logging integration with OpenTelemetry. When enabled, OpenTelemetry automatically sends logs as part of your tracing data, making it easier to correlate logs with traces.
Example usage:
export OTEL_PYTHON_LOGS_ENABLED=true
5. OTEL_EXPERIMENTAL_SAMPLER_ARG
This is a more advanced experimental feature, which allows you to pass custom arguments to your sampler. It’s useful for developers looking to implement highly tailored sampling logic.
Example usage:
export OTEL_EXPERIMENTAL_SAMPLER_ARG=custom-sampler-args
How to Manage Attribute and Span Limits
Managing the number of attributes and spans is crucial for maintaining OpenTelemetry's performance and ensuring efficient resource usage.
OpenTelemetry provides configuration options to limit both the number of attributes per span and the number of spans processed, helping prevent unnecessary overhead in high-traffic environments.
Attribute Limits
Spans carry attributes—key-value pairs that provide additional context. However, adding too many can increase memory usage and slow down performance.
To address this, OpenTelemetry allows you to configure limits on the number of attributes each span can have. This helps balance the amount of valuable context with system performance.
- Set a maximum number of attributes per span to ensure memory efficiency.
- Restricting attributes prevents overloading the system with excessive data.
Span Limits
Similar to attributes, the number of active spans OpenTelemetry tracks can also impact performance. Too many spans can overwhelm your resources, so controlling the span count is important for system stability.
OpenTelemetry lets you configure span limits to help keep resource usage in check.
- Limit the number of active spans at a time to avoid memory exhaustion.
- Helps ensure the SDK doesn’t track excessive spans, especially in high-volume environments.
Configuring these limits appropriately ensures that your OpenTelemetry setup remains efficient without compromising on the data you need for observability.
Optimizing Batch Processing in OpenTelemetry
Batch processing plays a critical role in controlling how spans and log records are aggregated and sent to exporters. Properly configuring batch processing allows you to balance the efficiency of data transmission with resource usage, especially in high-throughput environments. OpenTelemetry offers several configuration options to manage how data is batched before being exported.
Key Configuration Options
Batch Timeout (OTEL_BSP_TIMEOUT)
Controls the maximum time a batch of spans or logs can be held before being exported. Setting a longer timeout reduces the frequency of export calls, while a shorter one increases the responsiveness of your system.
Example:
export OTEL_BSP_TIMEOUT=5s
Max Batch Size (OTEL_BSP_MAX_EXPORT_BATCH_SIZE)
Determines the maximum number of spans or log records that can be sent in a single batch. This helps manage the amount of data being sent at once, preventing network overload and keeping resource usage in check.
Example:
export OTEL_BSP_MAX_EXPORT_BATCH_SIZE=100
Queue Size (OTEL_BSP_MAX_QUEUE_SIZE)
Configures the maximum number of spans or logs that can be queued before they are exported. Setting an appropriate queue size prevents data loss during high-traffic periods while balancing memory usage.
Example:
export OTEL_BSP_MAX_QUEUE_SIZE=200
Export Interval (OTEL_BSP_EXPORT_INTERVAL)
Specifies how often the SDK should attempt to export batched spans or logs. A lower interval can improve data timeliness but may lead to increased resource consumption, while a higher interval reduces export frequency.
Example:
export OTEL_BSP_EXPORT_INTERVAL=10s
Benefits of Batch Processing
- Resource Efficiency: By controlling the size of batches and the frequency of exports, batch processing ensures that telemetry data is handled efficiently without overloading the system.
- Reduced Network Load: Instead of sending individual spans or logs, batching groups them together, reducing the number of network calls required.
- Improved Performance: Aggregating spans and logs into batches before export can minimize the impact on application performance by reducing the overhead of frequent data transmission.
Setting Up the OpenTelemetry Metrics SDK for Better Performance
OpenTelemetry provides flexible configuration options that allow you to adjust how metrics are gathered and periodically exported, helping you optimize performance and resource usage in your observability setup.
Key Configuration Options
Metric Export Interval (OTEL_METRICS_EXPORT_INTERVAL)
This setting controls how often metrics are exported to the configured exporter. Shorter intervals allow for more frequent updates but increase the load on the system, while longer intervals help reduce overhead by exporting data less often.
Example:
export OTEL_METRICS_EXPORT_INTERVAL=30s
Max Queue Size (OTEL_METRICS_QUEUE_SIZE)
Defines the maximum number of metrics that can be queued before they are exported. Setting a larger queue size allows more metrics to accumulate before export, which can be helpful in high-traffic environments but may consume more memory.
Example:
export OTEL_METRICS_QUEUE_SIZE=100
Batch Size (OTEL_METRICS_MAX_EXPORT_BATCH_SIZE)
Specifies the maximum number of metrics that should be sent in a single batch. By adjusting this setting, you can control the balance between the size of individual batches and the frequency with which they are sent.
Example:
export OTEL_METRICS_MAX_EXPORT_BATCH_SIZE=50
Metric Exporter Type (OTEL_METRICS_EXPORTER)
Allows you to choose which exporter OpenTelemetry should use for metrics. Options may include Prometheus, OTLP, or other custom exporters, depending on your system’s requirements.
Example:
export OTEL_METRICS_EXPORTER=prometheus
Metrics Collection Period (OTEL_METRICS_COLLECTION_INTERVAL)
Sets the frequency at which metrics are collected from your application or service. This interval controls how often OpenTelemetry pulls the latest data points, balancing the need for up-to-date information with performance considerations.
Example:
export OTEL_METRICS_COLLECTION_INTERVAL=15s
Benefits of Metrics SDK Configuration
- Improved Performance: Adjusting export intervals, batch sizes, and queue sizes helps optimize resource usage by minimizing the overhead of metric collection and exporting.
- Timeliness and Accuracy: By configuring the export interval and collection period, you can balance the need for real-time metrics with system efficiency, ensuring you have timely yet resource-efficient telemetry data.
- Customizability: With flexible configuration options, you can tailor the metrics collection and exporting behavior to meet the specific needs of your application or infrastructure.
How Can You Fine-Tune Your Samplers and Propagators for Better Tracing
Sampler Configuration
The sampler determines whether a trace should be recorded or dropped. By configuring samplers, you control how much tracing data is collected, optimizing resource usage and performance.
Sampler Type (OTEL_TRACES_SAMPLER)
Defines the sampling strategy used to determine which traces are recorded. Common types include:
- always_on: Samples all traces (useful for debugging or high-priority use cases).
- always_off: No traces are sampled (used in environments where tracing is not needed).
- probability: Samples traces based on a defined rate (e.g., 50% of traces).
Example:
export OTEL_TRACES_SAMPLER=probability
Sampling Rate (OTEL_TRACES_SAMPLER_ARG)
Specifies the probability rate for the probability sampler type. This value is a float between 0.0 (no traces sampled) and 1.0 (all traces sampled).
Example:
export OTEL_TRACES_SAMPLER_ARG=0.1
Custom Sampler
OpenTelemetry also supports creating custom samplers, allowing you to implement complex sampling strategies based on your specific needs. You can integrate these samplers into your application to fine-tune trace collection.
Propagator Configuration
Propagators manage the propagation of context across distributed systems, ensuring trace context remains intact as requests pass through different services.
Propagation Format (OTEL_TRACES_PROPAGATORS)
Specifies the format in which trace context is propagated. Common formats include:
- tracecontext: A standardized format based on W3C Trace Context, widely supported across various systems.
- baggage: Used to propagate additional context, such as metadata about the request, along with trace data.
Example:
export OTEL_TRACES_PROPAGATORS=tracecontext,baggage
Custom Propagator
Like samplers, propagators can be customized. If your system requires a specific propagation format or method, you can implement a custom propagator that suits your needs.
Benefits of Proper Sampler and Propagator Configuration
- Resource Optimization: By controlling trace sampling, you reduce the volume of trace data collected, saving storage space and processing power.
- Maintaining Context: Proper propagator configuration ensures that trace context is consistently passed between services, maintaining the integrity of distributed traces.
- Customization: With customizable samplers and propagators, you can tailor tracing behavior to your application's specific needs, optimizing both performance and observability.
Best Practices for Using OpenTelemetry Environment Variables
Environment-Specific Variables
Set environment variables specific to each environment (development, staging, production) to ensure consistency and avoid misconfiguration.
For example, you might use different exporters or sampling rates in production versus development.
Use a Configuration Management Tool
If you’re managing multiple environments or have numerous OpenTelemetry instances, consider using a configuration management tool (e.g., Ansible, Chef, or Terraform) to set and manage environment variables across different systems.
Security Considerations
Be mindful when using sensitive information (like API keys or tokens) in environment variables. Follow security best practices, such as using secret management tools or limiting the scope of environment variables.
Logging and Monitoring
OpenTelemetry can provide rich telemetry data but don’t forget to monitor the health of its components. Use environment variables to configure logging levels for debugging purposes (e.g., OTEL_LOG_LEVEL=debug
).
Documentation
Document the environment variables used in your OpenTelemetry setup. Keeping track of this information helps troubleshoot issues, onboard new team members, and maintain consistency across projects.
Conclusion
OpenTelemetry environment variables offer flexibility in configuring observability, allowing fine-grained control over data collection and system monitoring.
Careful management of these variables—such as setting environment-specific configurations, using configuration tools, and ensuring security—helps optimize performance and maintain consistency across environments.
When properly configured, OpenTelemetry enhances resource efficiency and improves system monitoring.