Getting Started with OpenTelemetry in Rust

So, you’ve got a Rust app doing some seriously cool stuff. But what happens if it starts acting up? How do you figure out what's going on inside? That’s where OpenTelemetry helps you keep an eye on your app and understand its behavior.

In this post, we’ll walk through setting up OpenTelemetry with Rust, covering everything from basic traces to useful metrics.

What is OpenTelemetry?

Before we get into the code, let's take a moment to define OpenTelemetry.

OpenTelemetry is an observability framework designed for collecting, processing, and exporting telemetry data—such as traces, metrics, and logs—from applications.

It enables developers to gain visibility into application performance and detect potential issues early. OpenTelemetry is compatible with a variety of backends, including Jaeger, Zipkin, and AWS X-Ray, making it a versatile choice for integrating observability across diverse environments.

Getting Started

To use OpenTelemetry in Rust, you’ll need to set up a few dependencies. Add the following lines to your Cargo.toml file:

[dependencies]
opentelemetry = { version = "0.20", features = ["rt-tokio"] }
opentelemetry_sdk = { version = "0.20", features = ["rt-tokio"] }
opentelemetry-otlp = "0.13"
tokio = { version = "1.0", features = ["full"] }
tracing-opentelemetry = "0.21"

These libraries will provide the necessary functions and optimizations for working with OpenTelemetry in Rust, allowing you to integrate tracing and export telemetry data effectively.

Basic Implementation

Below is an example of how to set up OpenTelemetry with tracing in a Rust application:

use opentelemetry::sdk::trace::{self, Config};
use opentelemetry::sdk::Resource;
use opentelemetry::KeyValue;
use opentelemetry_otlp::WithExportConfig;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
    let tracer = init_tracer().await?;
    
    // Your application code here

    opentelemetry::global::shutdown_tracer_provider();
    Ok(())
}

async fn init_tracer() -> Result<sdktrace::Tracer, Box<dyn std::error::Error + Send + Sync>> {
    let mut metadata = tonic::metadata::MetadataMap::new();
    metadata.insert("x-honeycomb-team", "your-api-key".parse().unwrap());

    let exporter = opentelemetry_otlp::new_exporter()
        .tonic()
        .with_endpoint("http://localhost:4317")
        .with_metadata(metadata)
        .build_span_exporter()?;

    let tracer = opentelemetry_sdk::trace::TracerProvider::builder()
        .with_batch_exporter(exporter)
        .with_resource(Resource::new(vec![
            KeyValue::new("service.name", "my-rust-service"),
        ]))
        .build();

    Ok(tracer)
}

This code sets up a basic tracing pipeline with OpenTelemetry. The init_tracer function configures an OTLP exporter, which sends trace data to a specified endpoint (in this case, http://localhost:4317).

A MetadataMap allows you to add API keys or other metadata for services like Honeycomb. Finally, we define resource attributes such as service.name to label our traces, making it easier to identify services in a distributed environment.

Details Behind This Code

This code showcases some key OpenTelemetry concepts in Rust:

opentelemetry::global::shutdown_tracer_provider(): Ensures that any unsent telemetry data is flushed before the application exits, preventing data loss during shutdown.
init_tracer: Initializes a tracer, which records data like execution times, events, and spans in your application. This data is then exported to a backend of your choice (e.g., Honeycomb or Jaeger) for analysis.
Tokio Runtime: OpenTelemetry uses asynchronous tasks to handle background work, such as batching and sending telemetry data. The tokio runtime is leveraged here to efficiently manage these async operations.

Advanced Configuration

Exporters

OpenTelemetry supports a variety of exporters to route telemetry data to different observability backends. Here are two commonly used exporters:

OTLP Exporter (Recommended):
The OTLP (OpenTelemetry Protocol) exporter is a versatile and widely adopted option that works with most observability platforms.

let otlp_exporter = opentelemetry_otlp::new_exporter()
    .tonic()
    .with_endpoint("http://localhost:4317")
    .build_span_exporter()?;

Jaeger Exporter (for Distributed Tracing):
For distributed tracing, Jaeger is a popular choice, especially in environments requiring visibility across multiple services.

let jaeger_exporter = opentelemetry_jaeger::new_agent_pipeline()
    .with_service_name("my-rust-service")
    .with_endpoint("http://localhost:14250")
    .install_batch(opentelemetry::runtime::Tokio)?;

Resource Attribution

Adding resource attributes can help make your telemetry data more descriptive and useful. You can attach metadata like the environment, service name, or hostname to provide more context:

let resource = Resource::new(vec![
    KeyValue::new("service.name", env::var("SERVICE_NAME").unwrap_or_else(|_| "unknown".into())),
    KeyValue::new("deployment.environment", "production"),
    KeyValue::new("host.name", hostname::get()?.to_string_lossy().into_owned()),
]);

Why Resource Attribution Matters

Resource attribution provides critical context, helping you pinpoint exactly where your telemetry data originates within a complex environment.

For instance, tagging data with an environment label (like "staging" or "production") can make troubleshooting faster and more targeted. It allows you to isolate issues to specific environments, services, or hosts, making it much easier to manage and understand what’s happening across your infrastructure.

Common Pitfalls and Solutions

Async Runtime Issues

If you forget to .await an async function, you'll run into errors. Ensure that async functions are awaited, like this:

let tracer = init_tracer().await?;

Missing Shutdown

Forgetting to shut down the tracer provider can result in unsent telemetry data. Always call:

opentelemetry::global::shutdown_tracer_provider();

Resource Leaks

Using .install_batch helps prevent resource leaks by efficiently handling background tasks, especially when batch exporting:

.install_batch(opentelemetry::runtime::Tokio)?

Integration with Cloud Providers

AWS

For services running on AWS, it’s beneficial to add cloud-specific metadata to your telemetry data:

let aws_resource = Resource::new(vec![
    KeyValue::new("cloud.provider", "aws"),
    KeyValue::new("cloud.region", env::var("AWS_REGION")?),
    KeyValue::new("cloud.availability_zone", env::var("AWS_AZ")?),
]);

Why Integrate with Cloud Providers?

Cloud provider metadata helps you analyze how your services are performing across different regions and availability zones.

This insight can be crucial for scaling or debugging specific areas of your cloud infrastructure, ensuring that you understand how each component is affected by its environment.

Production-Ready Setup

Here’s a typical setup for a production-grade OpenTelemetry tracer:

fn init_production_tracer() -> Result<Tracer, Error> {
    let exporter = opentelemetry_otlp::new_exporter()
        .tonic()
        .with_endpoint(env::var("OTEL_ENDPOINT")?)
        .with_timeout(Duration::from_secs(5))
        .with_metadata(get_auth_metadata()?)
        .build_span_exporter()?;

    let sampler = if env::var("OTEL_SAMPLING_RATIO")?.parse::<f64>()? < 1.0 {
        Sampler::ParentBased(Box::new(Sampler::TraceIdRatioBased(
            env::var("OTEL_SAMPLING_RATIO")?.parse()?,
        )))
    } else {
        Sampler::AlwaysOn
    };

    Ok(opentelemetry_sdk::trace::TracerProvider::builder()
        .with_batch_exporter(exporter)
        .with_resource(get_resource()?)
        .with_sampler(sampler)
        .build()
        .versioned_tracer("my-service", Some(env!("CARGO_PKG_VERSION")), None))
}

Why Production-Specific Setup?

In a production setup, fine-tuning parameters like sampling rates, timeouts, and authentication tokens is essential.

Sampling controls the volume of data sent to your telemetry backend, preventing overloads, while timeouts ensure responsive data transmission. This way configuring authentication and endpoint metadata, you also secure and customize your telemetry setup for optimal performance in a live environment.

Conclusion

OpenTelemetry in Rust is a powerful approach to observability, enabling you to monitor, trace, and debug your applications more effectively. Start with simple instrumentation, and expand as your needs grow.

For the latest updates and examples, visit the OpenTelemetry Rust GitHub repository.

🤝

And if you still feel like discussing more on this topic, chat with us on Discord! We have a dedicated channel where you can connect with other developers and discuss your use case.

FAQs

What is OpenTelemetry?

OpenTelemetry is an open-source framework providing APIs and SDKs to collect, process, and export telemetry data—like traces, logs, and metrics—from your applications. It’s designed to improve observability, helping you understand and troubleshoot your app’s performance.

Why is Rust Easier than Go for OpenTelemetry?

Rust's type system and error handling prevent many issues before runtime by catching them at compile time, making configuration mistakes easier to catch early. In Go, some errors may only appear during runtime, which can be more challenging in large applications.

How Are You Using OpenTelemetry in Your Rust Applications?

Personally, I use OpenTelemetry in Rust for:

Distributed tracing across services
Performance monitoring of critical code paths
Context-rich error tracking for easier debugging

What Cloud Endpoint Should I Use?

Here are a few options that work well in production:

AWS: AWS X-Ray with OTLP endpoint
GCP: Cloud Trace
Generic: Jaeger or Zipkin, both widely used and highly compatible