kube-state-metrics: Your Guide to Kubernetes Observability

As a DevOps engineer with years of experience optimizing Kubernetes clusters, I've come to appreciate the critical role that robust monitoring solutions play in maintaining healthy, efficient systems.

Among the myriad tools available, kube-state-metrics stands out as an indispensable component of any comprehensive Kubernetes observability stack. In this in-depth guide, I'll share my experiences and insights on leveraging this powerful exporter to enhance your Kubernetes monitoring capabilities.

Understanding kube-state-metrics

What is kube-state-metrics?

kube-state-metrics is an open-source add-on for Kubernetes that listens to the Kubernetes API server and generates metrics about the state of various Kubernetes objects.

Unlike metrics-server, which provides resource usage data (CPU, memory), kube-state-metrics focuses on the health and status of Kubernetes resources.

Key Features of kube-state-metrics

Object-focused metrics: Provides metrics for a wide range of Kubernetes objects, including Pods, Deployments, StatefulSets, and more.
Read-only access: Only requires read-only access to the Kubernetes API, enhancing security.
Prometheus compatibility: Exposes metrics in Prometheus format, making it easy to integrate with existing monitoring stacks.
Custom resource support: Can be extended to monitor custom resources (CRDs).

📑

Also read: How to Use Kubectl Logs for Viewing Kubernetes Pod Logs

Why kube-state-metrics Matters

In my years working with Kubernetes clusters, I've found that while tools like kubelet and metrics-server provide crucial CPU and memory usage data, they don't give the full picture.

kube-state-metrics fills this gap by offering insights into the state of deployments, pods, nodes, and other Kubernetes objects.

For example, while metrics-server might tell you that a node is using 80% of its CPU, kube-state-metrics can tell you how many pods are running on that node, how many are in a failed state, or whether there are any pending pods due to resource constraints.

Setting Up kube-state-metrics

Let's understand the process of setting up kube-state-metrics in a Kubernetes cluster:

I prefer using Helm, as it simplifies the process and manages updates efficiently.

Using Helm (Recommended)

First, add the Prometheus community repo:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm repo update

Then, install kube-state-metrics:

helm install kube-state-metrics prometheus-community/kube-state-metrics

For more customization options, you can create values.yaml file:

replicas: 2
autosharding:
  enabled: true
podSecurityPolicy:
  enabled: true

Then install with:

helm install kube-state-metrics prometheus-community/kube-state-metrics -f values.yaml

📑

Also read: Monitoring Kubernetes with Prometheus and Grafana

Manual Installation using YAML Manifests

If you prefer manual installation or need more control over the deployment, you can find the YAML manifests in the official kube-state-metrics GitHub repo.

Clone the repository:

git clone https://github.com/kubernetes/kube-state-metrics.git

cd kube-state-metrics
Apply the manifests:

kubectl apply -f examples/standard

Configuration and RBAC

After installation, kube-state-metrics will create a ServiceAccount, ClusterRole, and ClusterRoleBinding to ensure it has the necessary permissions to access the Kubernetes API server.

Here's a snippet of the ClusterRole YAML:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-state-metrics
rules:
- apiGroups: [""]
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs: ["list", "watch"]
- apiGroups: ["apps"]
  resources:
  - statefulsets
  - daemonsets
  - deployments
  - replicasets
  verbs: ["list", "watch"]
# ... more rules ...

This RBAC configuration ensures that kube-state-metrics has read-only access to the necessary Kubernetes resources.

To verify the installation, run:

kubectl get pods -n kube-system | grep kube-state-metrics

You should see the kube-state-metrics pod(s) running.

Building from Source

Building kube-state-metrics from the source allows you to customize the application or use the latest features not yet released in the official binaries. Follow these steps to build kube-state-metrics from the source:

Prerequisites

Before you begin, ensure you have the following installed:

Go: Make sure you have Go installed (version 1.18 or later is recommended). You can download it from the official Go website.
Git: You’ll need Git to clone the kube-state-metrics repository. Install Git from the official site if you don't have it already.

Steps to Build kube-state-metrics

Clone the Repository
Open a terminal and run the following command to clone the kube-state-metrics repository:

git clone https://github.com/kubernetes/kube-state-metrics.git

Change into the cloned directory:

cd kube-state-metrics

Checkout the Desired Version
If you want to build a specific version, use the following command to check out that version:

git checkout <version-tag>

Replace <version-tag> with the desired version (e.g., v2.0.0).

Build the Project
Use the following command to build the kube-state-metrics binary:

make build

This command compiles the code and generates the kube-state-metrics binary in the ./bin directory.

Run kube-state-metrics
You can run kube-state-metrics directly from the terminal using the built binary:

./bin/kube-state-metrics

By default, it listens on the port 8080. You can configure additional options as needed.

Verify the Installation
Open your web browser or use curl to access the kube-state-metrics metrics endpoint:

curl http://localhost:8080/metrics

Integrating with Prometheus

Now that we have kube-state-metrics running, let's integrate it with Prometheus to start collecting and visualizing our metrics.

Configuring Prometheus

Add the following job to your Prometheus config:

- job_name: 'kube-state-metrics'
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_name]
    regex: kube-state-metrics
    action: keep
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace

target_label: kubernetes_name
Restart Prometheus to apply the changes.

Querying kube-state-metrics in Prometheus

Once Prometheus is scraping kube-state-metrics, you can start querying the metrics. Here are some useful queries:

Number of pods by namespace:

sum(kube_pod_info) by (namespace)

Pods not in Running state:

sum(kube_pod_status_phase{phase!="Running"}) by (namespace, phase)

Deployments not at the desired number of replicas:

kube_deployment_spec_replicas != kube_deployment_status_replicas_available

Persistent volumes by phase:

sum(kube_persistentvolume_status_phase) by (phase)

📊

Also read: Converting OpenTelemetry Traces to Metrics with the SpanMetrics Connector

kube-state-metrics Use Cases

Let's understand some practical use cases where I've found kube-state-metrics to be invaluable:

1. Monitoring Pod Health

One of the most common issues I've encountered is pods stuck in a crash loop. kube-state-metrics makes it easy to track pod restarts:

sum(kube_pod_container_status_restarts_total) by (pod)

This query helps identify problematic pods quickly, saving precious debugging time. I once used this to identify a memory leak in a Java application that was causing frequent restarts.

2. Resource Quota Management

In multi-tenant clusters, managing resource quotas is crucial. kube-state-metrics provides metrics like:

kube_resourcequota

This allows us to track resource usage against quotas, helping prevent resource starvation issues. I've seen this bring down entire namespaces when not properly monitored.

Example query to check CPU usage against quota:

sum(kube_pod_container_resource_requests_cpu_cores) by (namespace) / 
sum(kube_resourcequota{resourcequota!="", resource="requests.cpu"}) by (namespace)

3. Persistent Volume Monitoring

I once dealt with a critical outage caused by running out of persistent volume space. kube-state-metrics could have helped prevent this with metrics like:

kube_persistentvolume_status_phase

This metric allows you to track the status of your persistent volumes and set up alerts for when they're nearing capacity.

4. Deployment Tracking

Tracking the success of rolling updates is crucial. kube-state-metrics provides detailed deployment metrics:

kube_deployment_status_replicas_available
kube_deployment_status_replicas_unavailable

These metrics have helped me quickly identify failed deployments and roll back when necessary.

For example, you can set up an alert for when available replicas don't match the desired state:

kube_deployment_status_replicas_available != kube_deployment_spec_replicas

5. Node Capacity Planning

kube-state-metrics provides valuable data for capacity planning:

sum(kube_node_status_capacity_cpu_cores) - sum(kube_node_status_allocatable_cpu_cores)

This query shows the difference between total CPU capacity and allocatable CPU, helping identify overhead and plan for cluster scaling.

🔖

Know more about collecting logs from Kubernetes with the OpenTelemetry Filelog Receiver in our blog!

Advanced Topics and Best Practices

As you become more comfortable with kube-state-metrics, consider these advanced topics and best practices:

1. Custom Resources

kube-state-metrics supports custom resources. I've used this to monitor application-specific CRDs, providing valuable business metrics.

To enable custom resource metrics, you need to build kube-state-metrics from the source with the custom resource enabled. Here's an example Dockerfile:

FROM golang:1.17 as builder
WORKDIR /go/src/k8s.io/kube-state-metrics
COPY . .
RUN make build-local

FROM gcr.io/distroless/static:nonroot
COPY --from=builder /go/src/k8s.io/kube-state-metrics/kube-state-metrics .
USER nonroot:nonroot
ENTRYPOINT ["./kube-state-metrics", "--custom-resource-state-config=/path/to/cr-config.yaml"]

The cr-config.yaml file would look something like this:

spec:
  resources:
    - groupVersionKind:
        group: "myapp.com"
        kind: "MyCustomResource"
        version: "v1"
      metrics:
        - name: "mycustomresource_status_phase"
          help: "The current phase of the custom resource"
          each:
            type: Gauge
            gauge:
              valueFrom:
                path: "{.status.phase}"

2. High Availability

For critical environments, consider running multiple replicas of kube-state-metrics using a StatefulSet for high availability. Here's an example configuration:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: kube-state-metrics
spec:
  serviceName: "kube-state-metrics"
  replicas: 2
  selector:
    matchLabels:
      app: kube-state-metrics
  template:
    metadata:
      labels:
        app: kube-state-metrics
    spec:
      containers:
      - name: kube-state-metrics
        image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.1.0
        ports:
        - containerPort: 8080
          name: http-metrics
        - containerPort: 8081
          name: telemetry
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 5

3. Resource Limits

Be sure to set appropriate CPU and memory limits for kube-state-metrics. I've seen it consume significant resources in large clusters. Here's an example of setting resource limits:

resources:
  limits:
    cpu: 100m
    memory: 150Mi
  requests:
    cpu: 100m
    memory: 150Mi

4. Metric Relabeling

Use Prometheus' relabeling features to add useful metadata to your metrics, making them more informative and easier to query. Here's an example relabeling configuration:

metric_relabel_configs:
- source_labels: [__name__]
  regex: 'kube_pod_container_status_running'
  action: keep
- action: labelmap
  regex: __meta_kubernetes_pod_label_(.+)

5. Grafana Dashboards

Create comprehensive Grafana dashboards using kube-state-metrics. The kube-state-metrics GitHub repo has some great examples to get you started.

Here's a simple Grafana dashboard query to show the number of pods per namespace:

sum(kube_pod_info) by (namespace)

For more complex dashboards, you can combine kube-state-metrics with other data sources. For example, to show CPU usage vs. requests:

sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace) /
sum(kube_pod_container_resource_requests_cpu_cores) by (namespace)

📑

Explore the Top Splunk Alternatives for 2024 in this comprehensive guide!

6. Alerting

Set up alerting rules in Prometheus to proactively notify you of potential issues. Here's an example alert for pods in a non-running state:

groups:
- name: kubernetes-apps
  rules:
  - alert: KubePodNotReady
    expr: sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown"}) > 0
    for: 15m
    labels:
      severity: warning
    annotations:
      summary: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} is not ready"
      description: "Pod {{ $labels.pod }} has been in a non-ready state for more than 15 minutes."

7. Performance Tuning

For large clusters, you might need to tune kube-state-metrics for better performance.

Some options include:

Enabling metric caching:

--metric-cache-size=1000

Adjusting the metrics update interval:

--metric-resync-interval=30s

Using metric allow-lists to reduce the number of metrics collected:

--metric-allowlist=kube_pod_status_phase,kube_deployment_status_replicas

These optimizations can significantly improve the runtime performance of kube-state-metrics, especially in large Kubernetes environments.

8. Integration with Cloud Providers

When running Kubernetes on cloud providers like AWS, you can leverage kube-state-metrics to monitor cloud-specific resources. For example, you can track the number of load balancers created by your Ingress controllers:

sum(kube_service_spec_type{type="LoadBalancer"}) by (namespace)

This query helps you keep track of your AWS resources and associated costs.

9. Container Image Management

kube-state-metrics can help you track the usage of container images across your cluster:

sum(kube_pod_container_info) by (image)

This is particularly useful for ensuring that all pods are running the expected Docker images and versions.

Troubleshooting Common Issues

Even with careful setup, you might encounter some issues. Here are some common problems and their solutions:

High CPU Usage: If kube-state-metrics is consuming too much CPU, consider using metric allow-lists or increasing the metric resync interval.
Memory Leaks: Ensure you're using the latest version of kube-state-metrics. Earlier versions had memory leak issues that have since been resolved.
Missing Metrics: Check the RBAC permissions. kube-state-metrics might not have access to all namespaces or resources.
Inconsistent Metrics: This can happen in large clusters due to the eventually consistent nature of the Kubernetes API. Consider increasing the metric resync interval.
Ingress Metrics Missing: If you're not seeing metrics for Ingress resources, make sure you're using a compatible Ingress controller and that kube-state-metrics has permission to access Ingress resources.
Authentication Issues: If you're experiencing authentication problems, ensure that the ServiceAccount associated with kube-state-metrics has the correct RBAC permissions. You may need to adjust the ClusterRole or Role bindings.

Conclusion

kube-state-metrics has become an essential tool in my Kubernetes observability stack. Its ability to provide deep insights into the state of Kubernetes objects, combined with the power of Prometheus and Grafana, has significantly improved my ability to maintain and troubleshoot Kubernetes clusters.

As you implement kube-state-metrics in your own clusters, keep in mind that every environment is unique.

Happy monitoring!

We'd love to hear your experiences with reliability, observability, or monitoring! Join the conversation and share your insights with us in the SRE Discord community.

FAQs

What are kube-state-metrics?
kube-state-metrics listens to the Kubernetes API server and generates metrics about Kubernetes object states. It provides a metrics endpoint for Prometheus, offering insights into the health and status of various resources.

What is the difference between node exporter and kube-state-metrics?
Node exporter collects system-level metrics from nodes (CPU, memory, disk usage), while kube-state-metrics generates metrics about Kubernetes objects (pods, deployments, services). They work together to give a complete view of cluster health.

How do you deploy kube-state-metrics on Kubernetes?
You can deploy kube-state-metrics using Helm charts or YAML manifests. For Helm, use: helm install kube-state-metrics prometheus-community/kube-state-metrics

How can we expose metrics to Prometheus?
kube-state-metrics exposes a metrics endpoint for Prometheus to scrape. Add this job to your Prometheus config:

- job_name: 'kube-state-metrics'
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_name]
    regex: kube-state-metrics
    action: keep

What metrics are provided by kube-state-metrics in Kubernetes?
kube-state-metrics provides metrics such as pod status, deployment status, node capacity, PersistentVolume and PersistentVolumeClaim status, and service details.

How do I monitor pod status using kube-state-metrics in Kubernetes?
Use Prometheus queries like kube_pod_status_phase{phase="Running"} for running pods and kube_pod_container_status_restarts_total for container restart counts.

Can you monitor Kubernetes cluster status using kube-state-metrics?
Yes, kube-state-metrics provides metrics on cluster status, such as node conditions and resource usage. For example, use kube_node_status_condition{condition="Ready", status="true"} to find ready nodes.

How many load balancer services do we have and what are their IPs?
Use the query kube_service_spec_type{type="LoadBalancer"} and kube_service_status_load_balancer_ingress to get the count and IPs of LoadBalancer services.