Kubernetes Observability with OpenTelemetry Operator

The OpenTelemetry Operator has become a go-to solution for developers and DevOps teams striving to improve observability in Kubernetes clusters.

It simplifies deploying and managing telemetry pipelines, allowing you to focus on analyzing metrics, logs, and traces without drowning in configuration details.

In this guide, we’ll walk through everything you need to know to use the OpenTelemetry Operator effectively. Let’s start with the basics and move toward hands-on configuration.

What is the OpenTelemetry Operator?

The OpenTelemetry Operator is a Kubernetes-native tool that simplifies observability pipelines by managing telemetry collectors.

Instead of manually configuring metrics and traces across workloads, you can use this operator to automate tasks like scaling collectors, exporting telemetry data, and instrumenting applications.

It works by using Kubernetes Custom Resource Definitions (CRDs) to configure the OpenTelemetry Collector. This collector is a versatile telemetry agent that processes and exports telemetry data to different backends like Prometheus, Jaeger, and ElasticSearch.

Key Features:

Deployment Modes: Supports various deployment configurations, including sidecars, DaemonSets, and standalone deployments.
Custom Pipelines: Create tailored telemetry pipelines with processors, exporters, and receivers.
Scalability: Add or remove collector instances with ease using Kubernetes APIs.
Auto-Instrumentation: Automates instrumentation for applications running in the cluster.

Why Use the OpenTelemetry Operator?

Observability is no longer a luxury—it's a necessity for modern cloud-native applications.

With the rise of microservices and distributed systems, tracking performance metrics, understanding system health, and debugging failures require powerful tools.

The OpenTelemetry Operator simplifies:

Instrumentation: Eliminates manual setup of telemetry libraries in application code.
Pipeline Management: Lets you define telemetry pipelines declaratively using CRDs.
Resource Optimization: Ensures your collectors can scale up or down based on workload demands.
Backend Integration: Works easily with various telemetry backends, including Prometheus, Last9, and Jaeger.

Installing the OpenTelemetry Operator

Before we talk about configurations, let’s get the operator installed in your Kubernetes cluster. You can install it using Helm or by directly applying Kubernetes manifests.

Option 1: Installing with Helm

Helm makes installation simple and repeatable.

Add the OpenTelemetry Helm Chart:

helm repo add opentelemetry https://openTelemetry.github.io/opentelemetry-helm-charts
helm repo update

Install the Operator:

helm install opentelemetry-operator opentelemetry/opentelemetry-operator \
--namespace observability --create-namespace

Verify Installation:

kubectl get pods -n observability

You should see a running pod named opentelemetry-operator.

Option 2: Installing with Kubernetes Manifests

If Helm isn’t your style, you can deploy the operator using YAML manifests:

kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

This method gives you more control over configurations but requires manual updates for upgrades.

Setting Up Your First OpenTelemetry Collector

With the operator installed, the next step is deploying a telemetry collector. The OpenTelemetryCollector CRD makes this process declarative and flexible.

Let’s configure a collector in deployment mode.

Example YAML for a Basic Collector

apiVersion: operator.openTelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: basic-otel-collector
  namespace: observability
spec:
  mode: deployment
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: "0.0.0.0:55680"
    processors:
      batch:
    exporters:
      logging:
        logLevel: debug
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [logging]

Breakdown of the Configuration:

Receivers: Configures the collector to receive data using the OTLP protocol.
Processors: Adds batching to improve telemetry efficiency.
Exporters: Sends the telemetry data to a logging endpoint for debugging.
Service Pipelines: Defines a pipeline for processing traces, connecting receivers to exporters.

Apply this configuration:

kubectl apply -f collector-config.yaml

Scaling Your Collector

Once your collector is running, it’s time to make sure it can handle your workloads. In Kubernetes, scaling collectors is as simple as updating the replicas field in the CRD.

apiVersion: operator.openTelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: scalable-otel
  namespace: observability
spec:
  mode: deployment
  replicas: 5
  config: |
    receivers:
      otlp:
    processors:
      batch:
    exporters:
      logging:

This configuration deploys 5 collector instances. Use the following command to confirm scaling:

kubectl get pods -n observability

Auto-Instrumentation in Kubernetes

One of the standout features of the OpenTelemetry Operator is auto-instrumentation. It allows you to instrument applications automatically without modifying their code.

Here’s an example for a Java application:

Example YAML for Auto-Instrumentation

apiVersion: apps/v1
kind: Deployment
metadata:
  name: java-app
  annotations:
    instrumentation.opentelemetry.io/inject-java: "true"
spec:
  template:
    spec:
      containers:
      - name: app
        image: java-app:latest

When this deployment runs, the operator injects the OpenTelemetry Java agent as a sidecar container. The application will start sending telemetry data to the collector automatically.

Advanced Configuration of OpenTelemetry Collectors

As your workloads grow, your telemetry pipelines may need fine-tuning. This section talks about advanced configurations for custom telemetry pipelines.

Adding Multiple Exporters

Sometimes, you may want to send telemetry data to multiple destinations, such as Prometheus, Jaeger, or AWS X-Ray. Here’s an example:

apiVersion: operator.openTelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: advanced-otel-collector
  namespace: observability
spec:
  mode: deployment
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
    processors:
      batch:
    exporters:
      prometheus:
        endpoint: "0.0.0.0:9090"
      jaeger:
        endpoint: "jaeger-service:14250"
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [prometheus, jaeger]

Configuration Highlights:

Multiple Exporters: The pipeline exports telemetry to Prometheus for metrics and Jaeger for traces.
Receiver Protocols: Configures OTLP to receive data via gRPC.
Batch Processor: Optimizes telemetry data transmission.

Apply the configuration:

kubectl apply -f advanced-collector.yaml

Managing Namespace Isolation

For multi-tenant environments or when working with namespace-specific telemetry pipelines, you can scope collectors to specific namespaces.

Example Namespace-Scoped Collector

apiVersion: operator.openTelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: namespace-otel
  namespace: team-a
spec:
  mode: deployment
  config: |
    receivers:
      otlp:
    exporters:
      logging:
    service:
      pipelines:
        traces:
          receivers: [otlp]
          exporters: [logging]

This configuration ensures telemetry from team-a is processed independently.

Debugging Common Issues

Setting up observability pipelines can come with its share of hiccups. Let’s troubleshoot some common issues.

1. Collector Pods CrashLoopBackOff

Cause: Invalid configuration in the collector YAML.
Solution: Check logs for errors:

kubectl logs <collector-pod-name> -n observability

2. No Telemetry Data Received

Cause: Misconfigured receivers or application instrumentation.
Solution: Ensure the application sends telemetry to the correct endpoint:

kubectl port-forward svc/basic-otel-collector 55680:55680 -n observability

3. Data Not Reaching Backend

Cause: Exporter misconfiguration.
Solution: Verify the backend endpoint and exporter settings in your collector configuration.

Observability Best Practices

To make the most out of the OpenTelemetry Operator, follow these best practices:

Start Small: Begin with a single telemetry pipeline and scale as needed.
Utilize CRDs: Use Kubernetes CRDs to manage configurations declaratively.
Monitor Collector Health: Use tools like kubectl top to check resource usage for collectors.
Namespace Strategies: Isolate telemetry pipelines by namespaces for better multi-tenancy management.
Use Metrics for Scaling: Use Kubernetes Horizontal Pod Autoscaler (HPA) to adjust collector replicas based on CPU or memory usage.

Conclusion

The OpenTelemetry Operator transforms the complexity of observability into a manageable and efficient process. From auto-instrumentation to scalable collectors, it empowers teams to monitor applications in Kubernetes with ease.

Checkout our other blogs on Openetelemtry:

🤝

Join us on Discord! Connect with fellow developers in our dedicated channel and share your use cases or brainstorm ideas together.

FAQs

What Kubernetes versions are supported?
The OpenTelemetry Operator supports Kubernetes 1.20 and later.

Can I use OpenTelemetry for custom metrics?
Yes, the OpenTelemetry Collector can export custom metrics, provided your application generates them in supported formats like OTLP or Prometheus.

How do I update the operator?
If you installed via Helm:

helm upgrade opentelemetry-operator opentelemetry/opentelemetry-operator

For manifest-based installations, reapply the latest YAML:

kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

Is auto-instrumentation supported for all languages?
Auto-instrumentation support is available for languages like Java, Python, Node.js, and .NET. Check the OpenTelemetry documentation for the latest list of supported languages.

What backends are compatible with the OpenTelemetry Operator?
Popular backends include Last9, Prometheus, Jaeger, ElasticSearch, AWS X-Ray, and Google Cloud Trace. The collector’s modular design makes it compatible with most observability tools.