Sep 5th, ‘24/9 min read

kube-state-metrics: Your Complete Guide to Simplifying Kubernetes Observability

This guide provides an in-depth look at its setup and usage, helping you monitor and manage your Kubernetes clusters more efficiently.

kube-state-metrics: Your Complete Guide to Simplifying Kubernetes Observability

As a DevOps engineer with years of experience optimizing Kubernetes clusters, I've come to appreciate the critical role that robust monitoring solutions play in maintaining healthy, efficient systems. 

Among the myriad tools available, kube-state-metrics stands out as an indispensable component of any comprehensive Kubernetes observability stack. In this in-depth guide, I'll share my experiences and insights on leveraging this powerful exporter to enhance your Kubernetes monitoring capabilities.

Understanding kube-state-metrics

What is kube-state-metrics?

kube-state-metrics is an open-source add-on for Kubernetes that listens to the Kubernetes API server and generates metrics about the state of various Kubernetes objects. Unlike metrics-server, which provides resource usage data (CPU, memory), kube-state-metrics focuses on the health and status of Kubernetes resources.

Key Features

  1. Object-focused metrics: Provides metrics for a wide range of Kubernetes objects, including Pods, Deployments, StatefulSets, and more.
  2. Read-only access: Only requires read-only access to the Kubernetes API, enhancing security.
  3. Prometheus compatibility: Exposes metrics in Prometheus format, making it easy to integrate with existing monitoring stacks.
  4. Custom resource support: Can be extended to monitor custom resources (CRDs).

Why kube-state-metrics Matters

In my years working with Kubernetes clusters, I've found that while tools like kubelet and metrics-server provide crucial CPU and memory usage data, they don't give the full picture.

kube-state-metrics fills this gap by offering insights into the state of deployments, pods, nodes, and other Kubernetes objects.

For example, while metrics-server might tell you that a node is using 80% of its CPU, kube-state-metrics can tell you how many pods are running on that node, how many are in a failed state, or whether there are any pending pods due to resource constraints.

Setting Up kube-state-metrics

Let's deep dive into the process of setting up kube-state-metrics in a Kubernetes cluster:

Installation Options

There are several ways to install kube-state-metrics:

  1. Using Helm (recommended)
  2. Manual installation using YAML manifests
  3. Building from source

I prefer using Helm, as it simplifies the process and manages updates efficiently.

Installation using Helm

First, add the Prometheus community repo:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm repo update

Then, install kube-state-metrics:

helm install kube-state-metrics prometheus-community/kube-state-metrics

For more customization options, you can create a values.yaml file:

replicas: 2
autosharding:
  enabled: true
podSecurityPolicy:
  enabled: true

Then install with:

helm install kube-state-metrics prometheus-community/kube-state-metrics -f values.yaml

Manual Installation

If you prefer manual installation or need more control over the deployment, you can find the YAML manifests in the official kube-state-metrics GitHub repo.

Clone the repository:

git clone https://github.com/kubernetes/kube-state-metrics.git
  1. cd kube-state-metrics
  2. Apply the manifests:
kubectl apply -f examples/standard

Configuration and RBAC

After installation, kube-state-metrics will create a ServiceAccount, ClusterRole, and ClusterRoleBinding to ensure it has the necessary permissions to access the Kubernetes API server.

Here's a snippet of the ClusterRole YAML:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-state-metrics
rules:
- apiGroups: [""]
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs: ["list", "watch"]
- apiGroups: ["apps"]
  resources:
  - statefulsets
  - daemonsets
  - deployments
  - replicasets
  verbs: ["list", "watch"]
# ... more rules ...

This RBAC configuration ensures that kube-state-metrics has read-only access to the necessary Kubernetes resources.

To verify the installation, run:

kubectl get pods -n kube-system | grep kube-state-metrics

You should see the kube-state-metrics pod(s) running.

Integrating with Prometheus

Now that we have kube-state-metrics running, let's integrate it with Prometheus to start collecting and visualizing our metrics.

Configuring Prometheus

Add the following job to your Prometheus config:

- job_name: 'kube-state-metrics'
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_name]
    regex: kube-state-metrics
    action: keep
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
  1. target_label: kubernetes_name
  2. Restart Prometheus to apply the changes.

Querying kube-state-metrics in Prometheus

Once Prometheus is scraping kube-state-metrics, you can start querying the metrics. Here are some useful queries:

  1. Number of pods by namespace:
sum(kube_pod_info) by (namespace)
  1. Pods not in Running state:
sum(kube_pod_status_phase{phase!="Running"}) by (namespace, phase)
  1. Deployments not at the desired number of replicas:
kube_deployment_spec_replicas != kube_deployment_status_replicas_available
  1. Persistent volumes by phase:
sum(kube_persistentvolume_status_phase) by (phase)

Real-World Use Cases

Let's explore some practical use cases where I've found kube-state-metrics to be invaluable:

1. Monitoring Pod Health

One of the most common issues I've encountered is pods stuck in a crash loop. kube-state-metrics makes it easy to track pod restarts:

sum(kube_pod_container_status_restarts_total) by (pod)

This query helps identify problematic pods quickly, saving precious debugging time. I once used this to identify a memory leak in a Java application that was causing frequent restarts.

2. Resource Quota Management

In multi-tenant clusters, managing resource quotas is crucial. kube-state-metrics provides metrics like:

kube_resourcequota

This allows us to track resource usage against quotas, helping prevent resource starvation issues. I've seen this bring down entire namespaces when not properly monitored.

Example query to check CPU usage against quota:

sum(kube_pod_container_resource_requests_cpu_cores) by (namespace) / 
sum(kube_resourcequota{resourcequota!="", resource="requests.cpu"}) by (namespace)

3. Persistent Volume Monitoring

I once dealt with a critical outage caused by running out of persistent volume space. kube-state-metrics could have helped prevent this with metrics like:

kube_persistentvolume_status_phase

This metric allows you to track the status of your persistent volumes and set up alerts for when they're nearing capacity.

4. Deployment Tracking

Tracking the success of rolling updates is crucial. kube-state-metrics provides detailed deployment metrics:

kube_deployment_status_replicas_available
kube_deployment_status_replicas_unavailable

These metrics have helped me quickly identify failed deployments and roll back when necessary.

For example, you can set up an alert for when available replicas don't match the desired state:

kube_deployment_status_replicas_available != kube_deployment_spec_replicas

5. Node Capacity Planning

kube-state-metrics provides valuable data for capacity planning:

sum(kube_node_status_capacity_cpu_cores) - sum(kube_node_status_allocatable_cpu_cores)

This query shows the difference between total CPU capacity and allocatable CPU, helping identify overhead and plan for cluster scaling.

Advanced Topics and Best Practices

As you become more comfortable with kube-state-metrics, consider these advanced topics and best practices:

1. Custom Resources

kube-state-metrics supports custom resources. I've used this to monitor application-specific CRDs, providing valuable business metrics.

To enable custom resource metrics, you need to build kube-state-metrics from the source with the custom resource enabled. Here's an example Dockerfile:

FROM golang:1.17 as builder
WORKDIR /go/src/k8s.io/kube-state-metrics
COPY . .
RUN make build-local

FROM gcr.io/distroless/static:nonroot
COPY --from=builder /go/src/k8s.io/kube-state-metrics/kube-state-metrics .
USER nonroot:nonroot
ENTRYPOINT ["./kube-state-metrics", "--custom-resource-state-config=/path/to/cr-config.yaml"]

The cr-config.yaml file would look something like this:

spec:
  resources:
    - groupVersionKind:
        group: "myapp.com"
        kind: "MyCustomResource"
        version: "v1"
      metrics:
        - name: "mycustomresource_status_phase"
          help: "The current phase of the custom resource"
          each:
            type: Gauge
            gauge:
              valueFrom:
                path: "{.status.phase}"

2. High Availability

For critical environments, consider running multiple replicas of kube-state-metrics using a StatefulSet for high availability. Here's an example configuration:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: kube-state-metrics
spec:
  serviceName: "kube-state-metrics"
  replicas: 2
  selector:
    matchLabels:
      app: kube-state-metrics
  template:
    metadata:
      labels:
        app: kube-state-metrics
    spec:
      containers:
      - name: kube-state-metrics
        image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.1.0
        ports:
        - containerPort: 8080
          name: http-metrics
        - containerPort: 8081
          name: telemetry
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 5

3. Resource Limits

Be sure to set appropriate CPU and memory limits for kube-state-metrics. I've seen it consume significant resources in large clusters. Here's an example of setting resource limits:

resources:
  limits:
    cpu: 100m
    memory: 150Mi
  requests:
    cpu: 100m
    memory: 150Mi

4. Metric Relabeling

Use Prometheus' relabeling features to add useful metadata to your metrics, making them more informative and easier to query. Here's an example relabeling configuration:

metric_relabel_configs:
- source_labels: [__name__]
  regex: 'kube_pod_container_status_running'
  action: keep
- action: labelmap
  regex: __meta_kubernetes_pod_label_(.+)

5. Grafana Dashboards

Create comprehensive Grafana dashboards using kube-state-metrics. The kube-state-metrics GitHub repo has some great examples to get you started.

Here's a simple Grafana dashboard query to show the number of pods per namespace:

sum(kube_pod_info) by (namespace)

For more complex dashboards, you can combine kube-state-metrics with other data sources. For example, to show CPU usage vs. requests:

sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace) /
sum(kube_pod_container_resource_requests_cpu_cores) by (namespace)
📑
Explore the Top Splunk Alternatives for 2024 in this comprehensive guide!

6. Alerting

Set up alerting rules in Prometheus to proactively notify you of potential issues. Here's an example alert for pods in a non-running state:

groups:
- name: kubernetes-apps
  rules:
  - alert: KubePodNotReady
    expr: sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown"}) > 0
    for: 15m
    labels:
      severity: warning
    annotations:
      summary: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} is not ready"
      description: "Pod {{ $labels.pod }} has been in a non-ready state for more than 15 minutes."

7. Performance Tuning

For large clusters, you might need to tune kube-state-metrics for better performance.

Some options include:

  1. Enabling metric caching:
--metric-cache-size=1000
  1. Adjusting the metrics update interval:
--metric-resync-interval=30s
  1. Using metric allow-lists to reduce the number of metrics collected:
--metric-allowlist=kube_pod_status_phase,kube_deployment_status_replicas

These optimizations can significantly improve the runtime performance of kube-state-metrics, especially in large Kubernetes environments.

8. Integration with Cloud Providers

When running Kubernetes on cloud providers like AWS, you can leverage kube-state-metrics to monitor cloud-specific resources. For example, you can track the number of load balancers created by your Ingress controllers:

sum(kube_service_spec_type{type="LoadBalancer"}) by (namespace)

This query helps you keep track of your AWS resources and associated costs.

9. Container Image Management

kube-state-metrics can help you track the usage of container images across your cluster:

sum(kube_pod_container_info) by (image)

This is particularly useful for ensuring that all pods are running the expected Docker images and versions.

Troubleshooting Common Issues

Even with careful setup, you might encounter some issues. Here are some common problems and their solutions:

  1. High CPU Usage: If kube-state-metrics is consuming too much CPU, consider using metric allow-lists or increasing the metric resync interval.
  2. Memory Leaks: Ensure you're using the latest version of kube-state-metrics. Earlier versions had memory leak issues that have since been resolved.
  3. Missing Metrics: Check the RBAC permissions. kube-state-metrics might not have access to all namespaces or resources.
  4. Inconsistent Metrics: This can happen in large clusters due to the eventually consistent nature of the Kubernetes API. Consider increasing the metric resync interval.
  5. Ingress Metrics Missing: If you're not seeing metrics for Ingress resources, make sure you're using a compatible Ingress controller and that kube-state-metrics has permissions to access Ingress resources.
  6. Authentication Issues: If you're experiencing authentication problems, ensure that the ServiceAccount associated with kube-state-metrics has the correct RBAC permissions. You may need to adjust the ClusterRole or Role bindings.
💡

Conclusion

kube-state-metrics has become an essential tool in my Kubernetes observability stack. Its ability to provide deep insights into the state of Kubernetes objects, combined with the power of Prometheus and Grafana, has significantly improved my ability to maintain and troubleshoot Kubernetes clusters.

Remember, observability is not just about collecting metrics—it's about gaining actionable insights. kube-state-metrics, when used effectively, can provide those insights and help you maintain a healthy, efficient Kubernetes environment.

As you implement kube-state-metrics in your own clusters, keep in mind that every environment is unique.

Don't be afraid to experiment with different configurations and metrics to find what works best for your specific use cases.

Happy monitoring!

We'd love to hear your experiences with reliability, observability, or monitoring! Join the conversation and share your insights with us in the SRE Discord community.

What are kube-state-metrics?
kube-state-metrics is a service that listens to the Kubernetes API server and generates metrics about the state of Kubernetes objects. It provides a metrics endpoint for Prometheus to scrape, offering insights into the health and status of various Kubernetes resources.

What is the difference between node exporter and kube-state-metrics?
Node exporter focuses on collecting system-level metrics from nodes (CPU, memory, disk usage, etc.), while kube-state-metrics generates metrics about Kubernetes objects (pods, deployments, services, etc.). They complement each other to provide a complete view of your cluster's health.

How do you deploy kube-state-metrics on Kubernetes?
You can deploy kube-state-metrics using Helm charts or by applying YAML manifests directly. Here's a simple Helm command to install kube-state-metrics:

helm install kube-state-metrics prometheus-community/kube-state-metrics

How can we expose metrics to Prometheus?
kube-state-metrics exposes a metrics endpoint that Prometheus can scrape. Add a job to your Prometheus configuration like this:

- job_name: 'kube-state-metrics'
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_name]
    regex: kube-state-metrics
    action: keep

What metrics are provided by kube-state-metrics in Kubernetes?
kube-state-metrics provides a wide range of metrics, including but not limited to:

  • Pod status and resource usage
  • Deployment status and replica counts
  • Node capacity and allocatable resources
  • PersistentVolume and PersistentVolumeClaim status
  • Service and Ingress details

How do I monitor pod status using kube-state-metrics in Kubernetes?
You can use queries like these in Prometheus:

kube_pod_status_phase{phase="Running"}  # Number of running pods
kube_pod_container_status_restarts_total  # Container restart count

Can you monitor Kubernetes cluster status using kube-state-metrics?
Yes, kube-state-metrics provides various metrics that give insights into cluster status, such as node conditions, resource usage, and the status of core Kubernetes components.

For example:

kube_node_status_condition{condition="Ready", status="true"}  # Number of ready nodes

How many load balancer services do we have and what are their IPs?
You can use a query like this to get the count and IPs of LoadBalancer services:

kube_service_spec_type{type="LoadBalancer"}
kube_service_status_load_balancer_ingress

These metrics will give you information about the number of LoadBalancer services and their assigned IPs.

Newsletter

Stay updated on the latest from Last9.

Authors

Prathamesh Sonpatki

Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Anjali Udasi

Helping to make the tech a little less intimidating. I love breaking down complex concepts into easy-to-understand terms.