As a DevOps engineer who's spent countless hours wrangling Kubernetes clusters and their metrics, I've learned the hard way that proper labeling is crucial, especially in multi-cluster environments.
Today, I'm going to share my experience adding cluster labels to metrics collected from Kubernetes clusters, focusing on setups using Prometheus Operator or the Kube Helm stack.
What is Prometheus?
Prometheus is an open-source monitoring system designed for cloud-native environments. It collects metrics using a pull-based model, storing timestamped data and offering robust alerting capabilities. Its deep integration with Kubernetes makes it an ideal solution for cluster monitoring.
Key Features of Prometheus:
- Metric collection and querying
- Timestamped data storage
- Alerting mechanisms
- Auto-discovery of Kubernetes targets
Prometheus scrapes metrics from Kubernetes components, allowing operators to gain detailed insights into cluster health.
Why Cluster Labels Matter
Before we dive into the how, let's quickly touch on the why. If you're managing multiple Kubernetes clusters, you've probably run into situations where you couldn't immediately tell which metric came from which cluster.
This can be a real headache when you're trying to diagnose issues or compare performance across environments.
Adding a cluster label to your metrics solves this problem. It allows you to:
- Easily filter and group metrics by cluster
- Create more meaningful dashboards and alerts
- Simplify the process of comparing metrics across different clusters
Now, let's get into the nitty-gritty of how to implement this for both Prometheus Operator and Kube Helm stack users.
How to Expose Kubernetes Metrics to Prometheus
To expose Kubernetes metrics to Prometheus, follow these steps:
1. Configure Metrics Endpoints
- Use
kubectl
and YAML configurations to define endpoints that expose metrics. - Ensure that the Kubernetes controller manager is properly configured to expose control-plane metrics.
- If running Prometheus in a container runtime like Docker, verify that the necessary ports are exposed for metric scraping.
2. Use kube-state-metrics and Exporters
- kube-state-metrics provides detailed cluster-level metrics, including resource utilization and object states.
- Use exporters for additional system components, such as etcd for Kubernetes storage metrics and DNS-related exporters for service resolution monitoring.
- If you're monitoring cronjob executions, ensure that scheduled jobs expose execution metrics.
3. Utilize Namespaces and Labels
- Properly categorizing metrics with labels ensures structured and efficient monitoring.
- Avoid using deprecated labels or annotations that may be phased out in future Kubernetes versions.
- Define labels that align with computing resources to track CPU and memory utilization per namespace.
4. Enable Secure Access and Authentication
- Configure Prometheus to authenticate with the Kubernetes API server securely.
- If storing metric configurations on GitHub, follow best practices to manage access securely and avoid exposing sensitive data.
Adding Cluster Labels with Prometheus Operator
If you're using Prometheus Operator, you're in luck. It provides a straightforward way to add custom labels to all metrics scraped from your cluster.
Step 1: Update the Prometheus resource
First, you'll need to update your Prometheus custom resource. Here's an example of how to add a cluster label:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
namespace: monitoring
spec:
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
team: frontend
ruleSelector:
matchLabels:
team: frontend
resources:
requests:
memory: 400Mi
enableAdminAPI: false
externalLabels:
cluster: production-cluster-1 # Add this line
The externalLabels
field is where the magic happens. Any labels defined here will be added to all metrics scraped by this Prometheus instance.
Step 2: Apply the changes
After updating the Prometheus resource, apply it to your cluster:
kubectl apply -f prometheus.yaml
Step 3: Verify the changes
To verify that the new label is being applied, you can query Prometheus directly. Here's an example query:
sum(kube_pod_info) by (cluster)
This should return results grouped by your new cluster label.
Adding Cluster Labels with Kube Helm Stack
If you're using the Kube Helm stack (which typically includes Prometheus), there are actually a couple of ways to add cluster labels. Let's explore both methods.
Method 1: Using prometheusSpec.externalLabels
Step 1: Update your values.yaml
When installing or upgrading your Helm release, you'll need to modify the values.yaml
file. Here's an example of how to add a cluster label:
prometheus:
prometheusSpec:
externalLabels:
cluster: production-cluster-1
Step 2: Apply the changes
If you're installing for the first time:
helm install monitoring prometheus-community/kube-prometheus-stack -f values.yaml
If you're updating an existing installation:
helm upgrade monitoring prometheus-community/kube-prometheus-stack -f values.yaml
Step 3: Verify the changes
Just like with Prometheus Operator, you can verify the changes by querying Prometheus:
sum(kube_pod_info) by (cluster)
Method 2: Using commonLabels
An alternative approach is to use the commonLabels
field in your Helm values. This method has some advantages, as it applies the labels more broadly within your Helm release.
Step 1: Update your values.yaml
Add the commonLabels
field to your values.yaml
:
commonLabels:
cluster: otlp-aps1.last9.io
Step 2: Apply the changes
Use the same Helm install or upgrade commands as in Method 1.
Step 3: Verify the changes
You can verify the changes by checking the labels on the Prometheus pods:
kubectl get pods -n monitoring -l app=prometheus -o yaml | grep -i cluster
You should see your cluster label in the output.
Comparing the Two Methods
- Scope:
prometheusSpec.externalLabels
adds labels specifically to metrics collected by Prometheus.commonLabels
adds labels to all resources created by the Helm chart, including the Prometheus pods themselves.
- Flexibility:
prometheusSpec.externalLabels
gives you more control over which metrics get labeled.commonLabels
is a broader approach that ensures consistency across all resources.
- Use Case:
- Use
prometheusSpec.externalLabels
if you only want to label the metrics. - Use
commonLabels
if you want to label both the metrics and the Kubernetes resources created by the Helm chart.
- Use
In many cases, using commonLabels
can be a more comprehensive solution, as it ensures that both your metrics and your Kubernetes resources are consistently labeled. This can be particularly useful for resource management and troubleshooting.
However, if you need fine-grained control over metric labeling without affecting other resources, stick with prometheusSpec.externalLabels
.
Remember, you can always combine both approaches if needed, but be careful to avoid conflicts or redundancy in your label definitions.
kubectl top
, check out this comprehensive guide on kubectl top.Advanced Techniques: Using Relabeling
In some cases, you may need finer control over how cluster labels are applied to Kubernetes metrics. Prometheus provides a powerful relabeling feature that allows you to dynamically modify labels before they are stored.
This is particularly useful when you want to ensure consistency, reduce high cardinality, or enforce specific naming conventions.
Example: Adding a Cluster Label Based on Node Name
You can use Prometheus relabeling to add a cluster label dynamically by extracting values from Kubernetes node metadata.
Configuration:
prometheus:
prometheusSpec:
relabelConfigs:
- source_labels: [__address__]
regex: '(.*)'
target_label: cluster
replacement: 'production-cluster-1'
How This Works:
- The source_labels field selects the address label, which contains the target address of the scraped metric source.
- The regex pattern
(.*)
captures any value, allowing the rule to apply to all targets. - The target_label specifies that a new label named cluster will be added.
- The replacement field assigns the fixed value "production-cluster-1" to the cluster label.
With this configuration, all scraped metrics will automatically include a cluster="production-cluster-1" label, ensuring a unified labeling scheme across your monitoring setup.
Additional Use Cases for Relabeling
Extracting Labels from Kubernetes Metadata
- source_labels: [__meta_kubernetes_pod_label_environment]
target_label: environment
This assigns the environment label dynamically based on pod metadata.
Dropping Unnecessary Labels to Reduce Storage Overhead
- regex: "kubernetes_io.*"
action: drop
This prevents excessive Kubernetes-generated labels from bloating your Prometheus storage.
3 Common Pitfalls and How to Avoid Them
When adding cluster labels to Kubernetes metrics, some common pitfalls can impact monitoring efficiency, storage, and query performance. Here’s how to avoid them:
1. Label Conflicts
Issue: If you choose a label name that’s already in use (e.g., environment, region), it can lead to confusion, data inconsistencies, or even overwrite existing labels in your monitoring setup.
How to Avoid:
- Before adding a new label, check your existing metrics using PromQL:
label_values(kube_pod_info, cluster)
- Use unique and descriptive label names that do not conflict with built-in Kubernetes labels. Labels such as annotations and authentication-related fields should be reviewed to prevent unintended conflicts.
- Establish label naming conventions within your team to prevent accidental duplication.
2. Performance Impact
Issue: Labels create unique metric series in Prometheus. If too many labels are added or if labels have high cardinality (e.g., pod_id or request_id), it can increase storage costs and slow down queries.
How to Avoid:
- Only add labels that are necessary for filtering or aggregating metrics. Avoid adding dynamic values like timestamps, unique pod IDs, or request traces as labels.
- Use Prometheus relabeling to refine collected metrics:
relabel_configs:
- source_labels: [pod]
target_label: pod_name
action: replace
- Kubernetes daemonset logs and kubelet metrics often generate a high number of labels. Consider filtering unnecessary labels to optimize performance.
- Monitor Prometheus storage usage and set retention limits to manage data growth:
storage.tsdb.retention.time: "15d"
If running Prometheus in a cloud provider environment, ensure your storage strategy aligns with cost management.
3. Consistency Across Clusters
Issue: If multiple Kubernetes clusters have inconsistent label names (cluster=prod-1 in one and cluster_name=production in another), it makes cross-cluster queries complex and error-prone.
How to Avoid:
- Define a standardized label schema across all clusters. Example:
metadata:
labels:
cluster: "production"
region: "us-east-1"
- In multi-cluster deployments, ensure that ingress and allocation strategies are consistently labeled to avoid discrepancies in monitoring.
- Use kube-prometheus-stack or Prometheus Operator to enforce consistent label usage.
- Test your cross-cluster queries in PromQL to ensure they work uniformly:
sum by (cluster) (kube_pod_status_phase)
- If using a CLI tool for debugging, verify that label consistency is maintained across different clusters.
Conclusion
Adding cluster labels to your Kubernetes metrics might seem like a small change, but it can significantly improve your observability and make your life easier when managing multiple clusters. Whether you're using Prometheus Operator or the Kube Helm stack, the process is relatively straightforward, and the benefits are well worth the effort.
Remember, good observability is about more than just collecting metrics – it's about making those metrics meaningful and actionable. Proper labeling is a key step in that direction.
Happy monitoring!
FAQs
What Is Kubernetes Monitoring?
Kubernetes monitoring is the process of collecting, analyzing, and visualizing data about a Kubernetes cluster’s performance, resource usage, and overall health. It helps ensure applications run smoothly, troubleshoot issues, and optimize workloads effectively. Monitoring typically includes tracking CPU, memory, network usage, pod status, and more.
What is Prometheus?
Prometheus is an open-source monitoring and alerting toolkit designed for cloud-native environments. It collects and stores metrics in a time-series format, supports powerful querying with PromQL, and provides robust alerting mechanisms. Prometheus is widely used in Kubernetes environments for real-time monitoring.
How can we expose metrics to Prometheus?
To expose metrics to Prometheus in Kubernetes, follow these steps:
- Use kube-state-metrics & Exporters: These components provide detailed cluster-level metrics.
- Expose Metrics Endpoints: Ensure your applications and services expose
/metrics
endpoints. - Configure Prometheus Scrape Targets: Modify
prometheus.yml
to define scraping jobs. - Deploy Service Monitors: If using
Prometheus Operator
, createServiceMonitor
resources to auto-discover metrics.
Are you managing a Kubernetes cluster and wondering how to monitor its resource usage?
Yes! To effectively monitor resource usage, you should:
- Use
kubectl top nodes/pods
to check real-time resource consumption. - Deploy Prometheus and Grafana for in-depth visualization.
- Set up alerts using Prometheus Alertmanager.
- Track critical metrics like CPU (
container_cpu_usage_seconds_total
) and memory (container_memory_usage_bytes
).
What Kubernetes Metrics Should You Measure?
Some essential Kubernetes metrics to track include:
- Pod & Node Health:
kube_pod_status_phase
,node_name
- Resource Usage:
container_cpu_usage_seconds_total
,container_memory_usage_bytes
- Networking & Storage:
kube_pod_container_status_restarts_total
,kube_persistentvolume_capacity_bytes
- Scaling & Autoscaling:
horizontalpodautoscaler_current_replicas
,horizontalpodautoscaler_target_cpu_utilization_percentage
What is kube-prometheus-stack?
kube-prometheus-stack
is a Helm chart that bundles Prometheus, Grafana, Alertmanager, and other monitoring components into a single deployment for Kubernetes. It simplifies cluster monitoring by offering pre-configured dashboards, alerting rules, and integrations.
How do I use labels to filter metrics in a Kubernetes cluster?
Labels in Kubernetes help categorize and filter metrics effectively. To use labels in Prometheus:
- Apply labels to Kubernetes objects (
metadata.labels
).
sum by (cluster) (kube_pod_info{namespace="production"})
Configure relabeling rules in Prometheus scrape configs to append meaningful labels to metrics.