Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

Sep 20th, ‘24 / 8 min read

Adding Cluster Labels to Kubernetes Metrics

A definitive guide on adding cluster label to all Kubernetes metrics

Adding Cluster Labels to Kubernetes Metrics

As a DevOps engineer who's spent countless hours wrangling Kubernetes clusters and their metrics, I've learned the hard way that proper labeling is crucial, especially in multi-cluster environments.

Today, I'm going to share my experience adding cluster labels to metrics collected from Kubernetes clusters, focusing on setups using Prometheus Operator or the Kube Helm stack.

What is Prometheus?

Prometheus is an open-source monitoring system designed for cloud-native environments. It collects metrics using a pull-based model, storing timestamped data and offering robust alerting capabilities. Its deep integration with Kubernetes makes it an ideal solution for cluster monitoring.

Key Features of Prometheus:

  • Metric collection and querying
  • Timestamped data storage
  • Alerting mechanisms
  • Auto-discovery of Kubernetes targets

Prometheus scrapes metrics from Kubernetes components, allowing operators to gain detailed insights into cluster health.

💡
For a deeper understanding of how different Prometheus metric types work and how they impact monitoring efficiency, check out this detailed guide on Prometheus Metrics Types.

Why Cluster Labels Matter

Before we dive into the how, let's quickly touch on the why. If you're managing multiple Kubernetes clusters, you've probably run into situations where you couldn't immediately tell which metric came from which cluster.

This can be a real headache when you're trying to diagnose issues or compare performance across environments.

Adding a cluster label to your metrics solves this problem. It allows you to:

  1. Easily filter and group metrics by cluster
  2. Create more meaningful dashboards and alerts
  3. Simplify the process of comparing metrics across different clusters

Now, let's get into the nitty-gritty of how to implement this for both Prometheus Operator and Kube Helm stack users.

How to Expose Kubernetes Metrics to Prometheus

To expose Kubernetes metrics to Prometheus, follow these steps:

1. Configure Metrics Endpoints

  • Use kubectl and YAML configurations to define endpoints that expose metrics.
  • Ensure that the Kubernetes controller manager is properly configured to expose control-plane metrics.
  • If running Prometheus in a container runtime like Docker, verify that the necessary ports are exposed for metric scraping.

2. Use kube-state-metrics and Exporters

  • kube-state-metrics provides detailed cluster-level metrics, including resource utilization and object states.
  • Use exporters for additional system components, such as etcd for Kubernetes storage metrics and DNS-related exporters for service resolution monitoring.
  • If you're monitoring cronjob executions, ensure that scheduled jobs expose execution metrics.
💡
To learn more about how kube-state-metrics enhances Kubernetes observability and provides valuable cluster-level insights, check out this in-depth guide on kube-state-metrics.

3. Utilize Namespaces and Labels

  • Properly categorizing metrics with labels ensures structured and efficient monitoring.
  • Avoid using deprecated labels or annotations that may be phased out in future Kubernetes versions.
  • Define labels that align with computing resources to track CPU and memory utilization per namespace.

4. Enable Secure Access and Authentication

  • Configure Prometheus to authenticate with the Kubernetes API server securely.
  • If storing metric configurations on GitHub, follow best practices to manage access securely and avoid exposing sensitive data.

Adding Cluster Labels with Prometheus Operator

If you're using Prometheus Operator, you're in luck. It provides a straightforward way to add custom labels to all metrics scraped from your cluster.

Step 1: Update the Prometheus resource

First, you'll need to update your Prometheus custom resource. Here's an example of how to add a cluster label:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
  namespace: monitoring
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  ruleSelector:
    matchLabels:
      team: frontend
  resources:
    requests:
      memory: 400Mi
  enableAdminAPI: false
  externalLabels:
    cluster: production-cluster-1  # Add this line

The externalLabels field is where the magic happens. Any labels defined here will be added to all metrics scraped by this Prometheus instance.

Step 2: Apply the changes

After updating the Prometheus resource, apply it to your cluster:

kubectl apply -f prometheus.yaml

Step 3: Verify the changes

To verify that the new label is being applied, you can query Prometheus directly. Here's an example query:

sum(kube_pod_info) by (cluster)

This should return results grouped by your new cluster label.

💡
If you're interested in leveraging OpenTelemetry for Kubernetes autoscaling and optimizing metric collection, check out this detailed guide on OpenTelemetry and Kubernetes Autoscaling Metrics.

Adding Cluster Labels with Kube Helm Stack

If you're using the Kube Helm stack (which typically includes Prometheus), there are actually a couple of ways to add cluster labels. Let's explore both methods.

Method 1: Using prometheusSpec.externalLabels

Step 1: Update your values.yaml

When installing or upgrading your Helm release, you'll need to modify the values.yaml file. Here's an example of how to add a cluster label:

prometheus:
  prometheusSpec:
    externalLabels:
      cluster: production-cluster-1

Step 2: Apply the changes

If you're installing for the first time:

helm install monitoring prometheus-community/kube-prometheus-stack -f values.yaml

If you're updating an existing installation:

helm upgrade monitoring prometheus-community/kube-prometheus-stack -f values.yaml

Step 3: Verify the changes

Just like with Prometheus Operator, you can verify the changes by querying Prometheus:

sum(kube_pod_info) by (cluster)

Method 2: Using commonLabels

An alternative approach is to use the commonLabels field in your Helm values. This method has some advantages, as it applies the labels more broadly within your Helm release.

Step 1: Update your values.yaml

Add the commonLabels field to your values.yaml:

commonLabels: 
  cluster: otlp-aps1.last9.io

Step 2: Apply the changes

Use the same Helm install or upgrade commands as in Method 1.

Step 3: Verify the changes

You can verify the changes by checking the labels on the Prometheus pods:

kubectl get pods -n monitoring -l app=prometheus -o yaml | grep -i cluster

You should see your cluster label in the output.

💡
If you're interested in using OpenTelemetry for Kubernetes autoscaling and optimizing metric collection, check out this detailed guide on OpenTelemetry and Kubernetes Autoscaling Metrics.

Comparing the Two Methods

  1. Scope:
    • prometheusSpec.externalLabels adds labels specifically to metrics collected by Prometheus.
    • commonLabels adds labels to all resources created by the Helm chart, including the Prometheus pods themselves.
  2. Flexibility:
    • prometheusSpec.externalLabels gives you more control over which metrics get labeled.
    • commonLabels is a broader approach that ensures consistency across all resources.
  3. Use Case:
    • Use prometheusSpec.externalLabels if you only want to label the metrics.
    • Use commonLabels if you want to label both the metrics and the Kubernetes resources created by the Helm chart.

In many cases, using commonLabels can be a more comprehensive solution, as it ensures that both your metrics and your Kubernetes resources are consistently labeled. This can be particularly useful for resource management and troubleshooting.

However, if you need fine-grained control over metric labeling without affecting other resources, stick with prometheusSpec.externalLabels.

Remember, you can always combine both approaches if needed, but be careful to avoid conflicts or redundancy in your label definitions.

💡
For insights on monitoring resource usage in Kubernetes using kubectl top, check out this comprehensive guide on kubectl top.

Advanced Techniques: Using Relabeling

In some cases, you may need finer control over how cluster labels are applied to Kubernetes metrics. Prometheus provides a powerful relabeling feature that allows you to dynamically modify labels before they are stored.

This is particularly useful when you want to ensure consistency, reduce high cardinality, or enforce specific naming conventions.

Example: Adding a Cluster Label Based on Node Name

You can use Prometheus relabeling to add a cluster label dynamically by extracting values from Kubernetes node metadata.

Configuration:

prometheus:
  prometheusSpec:
    relabelConfigs:
      - source_labels: [__address__]
        regex: '(.*)'
        target_label: cluster
        replacement: 'production-cluster-1'

How This Works:

  • The source_labels field selects the address label, which contains the target address of the scraped metric source.
  • The regex pattern (.*) captures any value, allowing the rule to apply to all targets.
  • The target_label specifies that a new label named cluster will be added.
  • The replacement field assigns the fixed value "production-cluster-1" to the cluster label.

With this configuration, all scraped metrics will automatically include a cluster="production-cluster-1" label, ensuring a unified labeling scheme across your monitoring setup.

Additional Use Cases for Relabeling

Extracting Labels from Kubernetes Metadata

- source_labels: [__meta_kubernetes_pod_label_environment]
  target_label: environment

This assigns the environment label dynamically based on pod metadata.

Dropping Unnecessary Labels to Reduce Storage Overhead

- regex: "kubernetes_io.*"
  action: drop

This prevents excessive Kubernetes-generated labels from bloating your Prometheus storage.

💡
To optimize Prometheus for large-scale environments and improve performance, check out this guide on Scaling Prometheus: Tips, Tricks, and Proven Strategies.

3 Common Pitfalls and How to Avoid Them

When adding cluster labels to Kubernetes metrics, some common pitfalls can impact monitoring efficiency, storage, and query performance. Here’s how to avoid them:

1. Label Conflicts

Issue: If you choose a label name that’s already in use (e.g., environment, region), it can lead to confusion, data inconsistencies, or even overwrite existing labels in your monitoring setup.

How to Avoid:

  • Before adding a new label, check your existing metrics using PromQL:
label_values(kube_pod_info, cluster)
  • Use unique and descriptive label names that do not conflict with built-in Kubernetes labels. Labels such as annotations and authentication-related fields should be reviewed to prevent unintended conflicts.
  • Establish label naming conventions within your team to prevent accidental duplication.

2. Performance Impact

Issue: Labels create unique metric series in Prometheus. If too many labels are added or if labels have high cardinality (e.g., pod_id or request_id), it can increase storage costs and slow down queries.

How to Avoid:

  • Only add labels that are necessary for filtering or aggregating metrics. Avoid adding dynamic values like timestamps, unique pod IDs, or request traces as labels.
  • Use Prometheus relabeling to refine collected metrics:
relabel_configs:
  - source_labels: [pod]
    target_label: pod_name
    action: replace
  • Kubernetes daemonset logs and kubelet metrics often generate a high number of labels. Consider filtering unnecessary labels to optimize performance.
  • Monitor Prometheus storage usage and set retention limits to manage data growth:
storage.tsdb.retention.time: "15d"

If running Prometheus in a cloud provider environment, ensure your storage strategy aligns with cost management.

3. Consistency Across Clusters

Issue: If multiple Kubernetes clusters have inconsistent label names (cluster=prod-1 in one and cluster_name=production in another), it makes cross-cluster queries complex and error-prone.

How to Avoid:

  • Define a standardized label schema across all clusters. Example:
metadata:
  labels:
    cluster: "production"
    region: "us-east-1"
  • In multi-cluster deployments, ensure that ingress and allocation strategies are consistently labeled to avoid discrepancies in monitoring.
  • Use kube-prometheus-stack or Prometheus Operator to enforce consistent label usage.
  • Test your cross-cluster queries in PromQL to ensure they work uniformly:
sum by (cluster) (kube_pod_status_phase)
  • If using a CLI tool for debugging, verify that label consistency is maintained across different clusters.
💡
For best practices on managing alerts efficiently in Prometheus, check out this in-depth guide on Prometheus Alertmanager.

Conclusion

Adding cluster labels to your Kubernetes metrics might seem like a small change, but it can significantly improve your observability and make your life easier when managing multiple clusters. Whether you're using Prometheus Operator or the Kube Helm stack, the process is relatively straightforward, and the benefits are well worth the effort.

Remember, good observability is about more than just collecting metrics – it's about making those metrics meaningful and actionable. Proper labeling is a key step in that direction.

Happy monitoring!

FAQs

What Is Kubernetes Monitoring?

Kubernetes monitoring is the process of collecting, analyzing, and visualizing data about a Kubernetes cluster’s performance, resource usage, and overall health. It helps ensure applications run smoothly, troubleshoot issues, and optimize workloads effectively. Monitoring typically includes tracking CPU, memory, network usage, pod status, and more.

What is Prometheus?

Prometheus is an open-source monitoring and alerting toolkit designed for cloud-native environments. It collects and stores metrics in a time-series format, supports powerful querying with PromQL, and provides robust alerting mechanisms. Prometheus is widely used in Kubernetes environments for real-time monitoring.

How can we expose metrics to Prometheus?

To expose metrics to Prometheus in Kubernetes, follow these steps:

  1. Use kube-state-metrics & Exporters: These components provide detailed cluster-level metrics.
  2. Expose Metrics Endpoints: Ensure your applications and services expose /metrics endpoints.
  3. Configure Prometheus Scrape Targets: Modify prometheus.yml to define scraping jobs.
  4. Deploy Service Monitors: If using Prometheus Operator, create ServiceMonitor resources to auto-discover metrics.

Are you managing a Kubernetes cluster and wondering how to monitor its resource usage?

Yes! To effectively monitor resource usage, you should:

  • Use kubectl top nodes/pods to check real-time resource consumption.
  • Deploy Prometheus and Grafana for in-depth visualization.
  • Set up alerts using Prometheus Alertmanager.
  • Track critical metrics like CPU (container_cpu_usage_seconds_total) and memory (container_memory_usage_bytes).

What Kubernetes Metrics Should You Measure?

Some essential Kubernetes metrics to track include:

  • Pod & Node Health: kube_pod_status_phase, node_name
  • Resource Usage: container_cpu_usage_seconds_total, container_memory_usage_bytes
  • Networking & Storage: kube_pod_container_status_restarts_total, kube_persistentvolume_capacity_bytes
  • Scaling & Autoscaling: horizontalpodautoscaler_current_replicas, horizontalpodautoscaler_target_cpu_utilization_percentage

What is kube-prometheus-stack?

kube-prometheus-stack is a Helm chart that bundles Prometheus, Grafana, Alertmanager, and other monitoring components into a single deployment for Kubernetes. It simplifies cluster monitoring by offering pre-configured dashboards, alerting rules, and integrations.

How do I use labels to filter metrics in a Kubernetes cluster?

Labels in Kubernetes help categorize and filter metrics effectively. To use labels in Prometheus:

  • Apply labels to Kubernetes objects (metadata.labels).
sum by (cluster) (kube_pod_info{namespace="production"})

Configure relabeling rules in Prometheus scrape configs to append meaningful labels to metrics.

Contents


Newsletter

Stay updated on the latest from Last9.

Authors
Prathamesh Sonpatki

Prathamesh Sonpatki

Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

X