How to Use OpenTelemetry for Kubernetes Autoscaling Metrics

Kubernetes autoscaling is great—until it isn't. Have you ever deployed an application, set up autoscaling, and then found yourself wondering why it isn’t working as expected? Maybe your pods aren’t scaling fast enough, or your cluster is over-provisioned. This is where OpenTelemetry Kubernetes autoscaling metrics come in.

In this guide, we’ll break down how you can use OpenTelemetry to collect, analyze, and optimize Kubernetes autoscaling metrics to ensure your applications scale efficiently.

Why Autoscaling in Kubernetes Matters

Kubernetes provides multiple autoscaling mechanisms:

Horizontal Pod Autoscaler (HPA): Adjusts the number of pod replicas based on CPU, memory, or custom metrics. This is useful for applications that experience fluctuating traffic.
Vertical Pod Autoscaler (VPA): Automatically adjusts resource requests and limits for individual pods. This is beneficial for workloads that require dynamic resource allocation.
Cluster Autoscaler: Scales the number of worker nodes in the cluster based on pending workloads. This helps optimize cost by reducing unnecessary node allocations.

However, default metrics (like CPU and memory) are often insufficient for fine-tuning autoscaling. If you’re relying on them alone, you might be missing out on more accurate scaling decisions. That’s where OpenTelemetry can help by providing additional insights.

What Is OpenTelemetry?

OpenTelemetry is an open-source observability framework for collecting telemetry data—metrics, logs, and traces—from applications and infrastructure. It’s vendor-agnostic and works with Prometheus, Grafana, and other monitoring tools.

Example Use Case:

If your application processes HTTP requests, OpenTelemetry can help track metrics like request latency and error rates. Instead of scaling pods solely based on CPU usage, you can scale based on actual application demand, leading to more efficient resource allocation.

💡

For more insights on OpenTelemetry, check out our guide on top OpenTelemetry questions answered to deepen your understanding.

Why Use OpenTelemetry for Kubernetes Autoscaling Metrics?

OpenTelemetry gives you deeper visibility into custom metrics beyond CPU and memory, such as:

Request latency: Ensures scaling decisions are based on real-time user experience.
Queue length: Helps detect backlogs in processing and adjust capacity accordingly.
Application-specific metrics (e.g., number of active users, event processing rate): Provides a more business-driven approach to scaling.

How to Set Up OpenTelemetry in Kubernetes for Autoscaling Metrics

1. Deploying the OpenTelemetry Collector for Data Collection

The OpenTelemetry Collector acts as an agent to collect and export metrics. Deploy it in your Kubernetes cluster using Helm:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
helm install my-otel open-telemetry/opentelemetry-collector --namespace monitoring

This command installs the OpenTelemetry Collector in the monitoring namespace, allowing it to gather telemetry data from your applications.

2. Configuring the OpenTelemetry Collector for Metrics Collection

Modify the collector configuration (otel-collector-config.yaml) to collect Kubernetes metrics:

receivers:
  kubeletstats:
    collection_interval: 10s
    auth_type: serviceAccount
exporters:
  prometheus:
    endpoint: ":9090"
extensions:
  health_check: {}

Apply the configuration:

kubectl apply -f otel-collector-config.yaml

This setup enables the OpenTelemetry Collector to collect pod-level statistics and export them to Prometheus.

💡

To ensure efficient data processing at scale, check out our guide on scaling the OpenTelemetry Collector.

3. Instrumenting Your Applications to Capture Custom Metrics

If your application is written in Python, Go, Java, or Node.js, you can integrate OpenTelemetry SDKs to emit custom metrics.

Example: Exposing Custom Metrics in a Python Application

from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import ConsoleMetricExporter, PeriodicExportingMetricReader

meter_provider = MeterProvider()
meter = meter_provider.get_meter("my-application")
request_count = meter.create_counter("http_requests_total")
request_count.add(1, {"endpoint": "/login"})

In this example, every time a request is made to the /login endpoint, the counter metric http_requests_total is incremented, allowing you to track request traffic.

4. Integrate OpenTelemetry Metrics with Kubernetes HPA

Once OpenTelemetry is collecting metrics, you can use them in Kubernetes autoscaling. For example, if you want to scale based on request latency, expose it through the OpenTelemetry Collector and register it as a custom metric in Kubernetes:

Register OpenTelemetry Metrics with Prometheus Adapter

Install the Prometheus adapter:

helm install prometheus-adapter prometheus-community/prometheus-adapter --namespace monitoring

Modify the Prometheus Adapter configuration to expose OpenTelemetry metrics for HPA:

rules:
  - seriesQuery: 'http_requests_total{namespace="default"}'
    resources:
      overrides:
        namespace: {resource: "namespace"}
    name:
      matches: "http_requests_total"
      as: "http_requests_per_second"
    metricsQuery: "sum(rate(http_requests_total{<<.LabelMatchers>>}[5m]))"

Apply the configuration:

kubectl apply -f prometheus-adapter-config.yaml

5. Create an HPA Using OpenTelemetry Metrics

Now that OpenTelemetry is exposing the metrics create an HPA that scales based on request rate:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: External
      external:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: 100

Apply it:

kubectl apply -f my-app-hpa.yaml

Now, Kubernetes will autoscale your application based on real application performance rather than just CPU or memory usage.

💡

Learn how to make the most of your telemetry data with our guide on OpenTelemetry visualization for better insights and monitoring.

Best Practices for OpenTelemetry Kubernetes Autoscaling

Use multiple metrics: Don't rely on a single metric. Combine request rate, latency, and CPU for better scaling decisions.
Tune your scaling thresholds: Default thresholds may not be ideal for your workload. Test and adjust based on historical data.
Monitor autoscaling behavior: Use Grafana to visualize how your HPA is reacting to OpenTelemetry metrics in real-time.
Optimize Collector performance: The OpenTelemetry Collector can become a bottleneck if not properly tuned. Limit the number of collected metrics to what's necessary.

💡

Explore how the OpenTelemetry Operator enhances Kubernetes observability for better monitoring and insights.

Conclusion

Kubernetes autoscaling works better when you have meaningful metrics. OpenTelemetry lets you collect custom metrics that reflect your application's actual performance, improving autoscaling accuracy.

Setting up OpenTelemetry in your cluster, instrumenting applications, and configuring Kubernetes HPA with custom metrics enables more intelligent and cost-effective scaling.

Have you tried using OpenTelemetry for Kubernetes autoscaling?

FAQs

1. How does OpenTelemetry help with Kubernetes autoscaling?

OpenTelemetry collects real-time metrics from applications and infrastructure, which can be used by Kubernetes' Horizontal Pod Autoscaler (HPA) to make scaling decisions based on actual workload demand.

2. What types of metrics can be used for Kubernetes autoscaling?

Metrics such as CPU usage, memory consumption, request latency, queue depth, and custom business metrics (e.g., active users or transactions per second) can be collected and used for autoscaling.

3. How do I configure OpenTelemetry to send metrics to Kubernetes HPA?

You need to set up the OpenTelemetry Collector to gather metrics from instrumented applications and export them to a monitoring backend (e.g., Prometheus, Last9), which HPA can then query for autoscaling decisions.

4. Can OpenTelemetry replace Kubernetes' built-in metrics server?

No, OpenTelemetry does not replace the Kubernetes metrics server but enhances it by providing custom application-level metrics that allow for more intelligent scaling beyond CPU and memory usage.

5. What backend should I use with OpenTelemetry for autoscaling?

Common backends include Prometheus, Grafana, Last9, and AWS CloudWatch, as they integrate well with Kubernetes HPA and allow querying of OpenTelemetry-collected metrics for autoscaling.

6. Is OpenTelemetry better than Prometheus for Kubernetes autoscaling?

OpenTelemetry and Prometheus work together rather than compete—OpenTelemetry provides vendor-neutral instrumentation, while Prometheus is commonly used as the storage and query layer for autoscaling metrics.

7. What are the challenges of using OpenTelemetry for autoscaling?

Some challenges include proper instrumentation of applications, configuring the OpenTelemetry Collector correctly, and ensuring efficient data processing to avoid performance overhead in high-traffic environments.