The Complete Guide to Monitoring Container CPU Usage

Have you ever opened your Kubernetes dashboard and wondered why your app seems to slow down?

As containers multiply rapidly, keeping track of CPU usage becomes a must. Let’s break it down by focusing on one key metric: container_cpu_usage_seconds_total.

The Anatomy of container_cpu_usage_seconds_total

container_cpu_usage_seconds_total is the Prometheus metric that tracks cumulative CPU time consumed by each container. Think of it as your container's timecard – it punches in every CPU second used across all cores.

Unlike other metrics that give you snapshots, this one's a running total that keeps climbing as long as your container's alive. It's like counting miles on a road trip – the number only goes up.

# HELP container_cpu_usage_seconds_total Cumulative cpu time consumed by the container in seconds
# TYPE container_cpu_usage_seconds_total counter
container_cpu_usage_seconds_total{container="nginx",namespace="default",pod="web-server-5d4b8b7f-2jlq9"} 156.321

What's happening behind the scenes?

This metric is derived from cgroups data in the Linux kernel. Specifically, it reads from /sys/fs/cgroup/cpu/cpu.usage_us (or equivalent paths depending on your cgroup version) and represents the number of microseconds that container processes have spent on the CPU.

The metric is multi-dimensional, with labels that identify the specific:

container - The container name
pod - The pod it's running in
namespace - The Kubernetes namespace
id - The container ID
image - The container image
name - The cgroup name

The counter increments each time the container uses CPU resources, regardless of whether it's handling requests or running background tasks.

💡

Keeping containers secure is just as important as monitoring their performance. Here’s a guide on what to watch for: Read more.

Why container_cpu_usage Is Critical for Performance Optimization and Cost Management

You're managing infrastructure, not collecting random numbers. So why does this particular metric deserve your attention?

Resource Planning – When you know what your containers actually consume (not what they request), you can right-size them and stop wasting cash on idle resources
Performance Troubleshooting – Spot the CPU-hungry containers bringing your cluster to its knees
Billing Insights – Connect real CPU usage to actual cloud costs and explain to your boss where the money's going
Scheduler Optimization – Understand how the Kubernetes scheduler is distributing workloads
Capacity Planning – Make data-driven decisions about cluster scaling

A DevOps engineer without CPU metrics is like a chef cooking blindfolded – you might get lucky, but chances are you're making a mess.

In the real world, this metric has helped teams:

Reduce cloud costs by 30-40% through proper resource allocation
Identify poorly performing microservices that were bottlenecking entire systems
Predict capacity needs weeks in advance, preventing outages

💡

If you're managing containers, sidecars can help with logging, security, and more. Here’s how they work: Read more.

Advanced Techniques for Calculating CPU Usage Percentage and Resource Efficiency

The raw counter is nice, but what you really want is usage percentage. Here's how to transform container_cpu_usage_seconds_total into something actually useful:

rate(container_cpu_usage_seconds_total{namespace="production"}[5m]) * 100

This Prometheus query:

Takes the rate of change over 5 minutes
Multiplies by 100 to get percentage
Filters to just your production namespace

Let me explain exactly what's happening here:

rate() calculates how fast the counter is increasing over the specified time window
The [5m] window provides enough data points to smooth out short spikes
Multiplying by 100 converts the decimal to a percentage

For multi-core calculations, you'll want to divide by the number of cores:

sum(rate(container_cpu_usage_seconds_total{namespace="production"}[5m])) by (pod) * 100 / scalar(count(node_cpu_seconds_total{mode="idle", cpu="0"}))

This gives you the percentage of your entire CPU capacity being used – crucial for capacity planning.

For a more sophisticated analysis, compare requested CPU to actual usage:

sum(rate(container_cpu_usage_seconds_total{namespace="production"}[5m])) by (pod) / 
sum(kube_pod_container_resource_requests{resource="cpu", namespace="production"}) by (pod)

This ratio helps identify over-provisioned containers – if the value is consistently low (< 0.3), you're likely wasting resources.

Create Production-Grade Rules Based on Usage Patterns and Thresholds

Alert fatigue is real. Your phone buzzing at midnight because CPU spiked for 10 seconds during a batch job? Not cool.

Here's a Prometheus alert rule that's reasonable:

- alert: ContainerHighCPUUsage
  expr: rate(container_cpu_usage_seconds_total{namespace="production"}[5m]) * 100 > 80
  for: 15m
  labels:
    severity: warning
  annotations:
    summary: "Container {{ $labels.container }} high CPU usage"
    description: "Container {{ $labels.container }} in pod {{ $labels.pod }} has been using over 80% CPU for 15 minutes."

This alert triggers only when a container burns above 80% CPU for a solid 15 minutes – filtering out those harmless spikes while catching real problems.

But let's go deeper with some more sophisticated alert rules:

- alert: ContainerCPUThrottling
  expr: rate(container_cpu_cfs_throttled_seconds_total{namespace="production"}[5m]) / rate(container_cpu_usage_seconds_total{namespace="production"}[5m]) > 0.25
  for: 15m
  labels:
    severity: warning
  annotations:
    summary: "Container {{ $labels.container }} is being CPU throttled"
    description: "Container {{ $labels.container }} in pod {{ $labels.pod }} is being throttled. Over 25% of its CPU time is being throttled for 15 minutes, which may impact performance."

This alert detects when Kubernetes is actively throttling your container – a sign you need to adjust your resource limits.

For detecting efficiency issues:

- alert: ContainerCPUEfficiencyLow
  expr: sum(rate(container_cpu_usage_seconds_total{namespace="production"}[1d])) by (pod) / sum(kube_pod_container_resource_requests{resource="cpu", namespace="production"}) by (pod) < 0.2
  for: 3d
  labels:
    severity: info
  annotations:
    summary: "Container efficiency is low"
    description: "Pod {{ $labels.pod }} is using less than 20% of its requested CPU for 3 days. Consider right-sizing the resource requests."

This catches containers that are consistently underutilizing their requested resources, helping you optimize costs.

💡

If you're choosing between Kubernetes and Docker Swarm, understanding their differences can help. Here’s a breakdown: Read more.

Data Visualization Strategies for CPU Analysis and Trend Detection

Raw numbers are for robots. Humans need visuals. In Grafana, try this dashboard query to see your top CPU-hungry containers:

topk(5, sum(rate(container_cpu_usage_seconds_total[5m])) by (container, pod) * 100)

But don't stop at basic line charts. Create heat maps to spot usage patterns by time of day, or use gauges to show real-time usage against thresholds.

For a comprehensive CPU dashboard, include:

Time-series panels showing:
- Overall cluster CPU usage
- Usage by namespace
- Usage by deployment
- Usage by individual pod
Heat maps displaying:
- CPU usage patterns by hour/day
- Throttling events
Efficiency metrics:
- Requested vs actual usage
- Cost per request based on CPU consumption
- Resource utilization efficiency
Alert panels showing:
- Recent CPU-related incidents
- Throttling events
- Pods approaching resource limits

Pro tip: Add variable selectors for namespace, deployment, and time range to make your dashboard interactive and useful for quick troubleshooting.

How to Interpret Common Usage Patterns and Solve Underlying Issues

CPU metrics tell stories if you know how to read them:

Pattern	What It Means	What To Do	Advanced Diagnosis
Consistent high usage	Container is CPU-bound	Increase CPU limits or optimize code	Profile the application to find hotspots; check for inefficient algorithms
Periodic spikes	Scheduled jobs running	Consider spreading job schedule	Review cron patterns; implement rate limiting
Gradual increases	Memory leak or data accumulation	Investigate application state	Check heap dumps; monitor garbage collection metrics alongside CPU
Sudden drops	Container restarts or OOM kills	Check for crashes and stability issues	Examine OOM scorer logs; analyze restart patterns
Sawtooth pattern	Autoscaling in action	Verify autoscaler settings	Review HPA metrics and thresholds; consider adjusting scale sensitivity
Plateau at limit	CPU throttling	Increase limits or optimize	Check throttling metrics; review QoS class

Understanding the relationship between CPU patterns and application behavior is critical. For example, if you see CPU usage spike just before memory usage increases, you might be dealing with a compute-intensive operation that builds large in-memory data structures.

💡

Not all Kubernetes pods are the same—each serves a different purpose. Here’s a breakdown of the types you should know: Read more.

Advanced CPU Monitoring for Full System Performance Analysis

container_cpu_usage_seconds_total is just the beginning. Level up your monitoring with:

CPU steal time (crucial for cloud environments):

avg by (instance) (rate(node_cpu_seconds_total{mode="steal"}[5m]) * 100)

This reveals when your cloud provider is oversubscribing physical CPUs.

Context switches:

rate(node_context_switches_total[1m])

High values can indicate thrashing and inefficient processing.

CPU scheduling delays:

sum(rate(scheduler_e2e_scheduling_duration_seconds_count[5m])) by (pod)

Spot when containers wait for CPU time.

CPU throttling metrics:

container_cpu_cfs_throttled_periods_total / container_cpu_cfs_periods_total

This shows the percentage of CPU periods where your container wanted more CPU than it was allowed.

For truly comprehensive monitoring, correlate CPU metrics with:

Network I/O rates
Disk I/O operations
Memory pressure
Garbage collection frequency (for JVM applications)
Request latency

This holistic approach helps identify whether CPU is truly your bottleneck or if other factors are contributing to performance issues.

How to Configure HPA and VPA for Optimal Resource Utilization

Turn your CPU monitoring insights into automated actions with Kubernetes Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA).

Here's a well-tuned HPA configuration based on container_cpu_usage_seconds_total:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
    scaleUp:
      stabilizationWindowSeconds: 60

This configuration:

Targets 70% CPU utilization (a good balance between efficiency and headroom)
Scales up quickly (60-second window) but down more cautiously (5-minute window)
Maintains at least 3 replicas for high availability

For VPA, which automatically adjusts CPU requests:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: myapp
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      minAllowed:
        cpu: 100m
      maxAllowed:
        cpu: 1
      controlledResources: ["cpu"]

This VPA configuration will automatically adjust CPU requests between 100m and 1 core based on actual usage trends.

Pro tip: For production workloads, start with VPA in "Recommend" mode and HPA in "Auto" mode. This gives you automatic horizontal scaling while collecting data on optimal vertical resource settings.

💡

Pods and nodes are both key to running workloads in Kubernetes, but they serve different roles. Here’s how they compare: Read more.

Conclusion

The humble container_cpu_usage_seconds_total metric might seem basic, but it's the foundation of effective container resource management.

The most sophisticated DevOps teams don't just monitor CPU – they build a culture around resource awareness:

Include CPU efficiency targets in service level objectives (SLOs)
Incorporate resource usage reviews into sprint retrospectives
Implement "resource budgets" for each microservice team
Run game days focused on resource optimization
Create automated reports that track CPU efficiency trends

💡

If you've got other container monitoring tricks, share them in our Discord Community and let's keep the conversation going!

FAQs

How is container_cpu_usage_seconds_total different from node_cpu_seconds_total?

container_cpu_usage_seconds_total measures CPU time consumed by specific containers, while node_cpu_seconds_total tracks CPU time at the node level across different modes (user, system, idle, etc.). The container metric is ideal for application-level monitoring, while the node metric helps with infrastructure capacity planning.

Why does my container show more than 100% CPU usage?

If your container shows >100% CPU usage, it's utilizing multiple CPU cores. For example, 250% means it's using the equivalent of 2.5 CPU cores. This is normal for multi-threaded applications designed to run across multiple cores.

How can I calculate CPU usage across multiple replicas of the same service?

Use this Prometheus query to aggregate CPU usage across replicas:

sum(rate(container_cpu_usage_seconds_total{app="your-service-label"}[5m])) * 100

This gives you the total CPU percentage used by all containers with the matching label.

What's the difference between CPU requests and limits in Kubernetes?

CPU requests are what the container is guaranteed to get, while limits are the maximum amount it can use. container_cpu_usage_seconds_total tracks actual usage, which can be anywhere between these values. If usage hits the limit consistently, you'll see throttling.

How often is container_cpu_usage_seconds_total updated?

Typically, the metric is collected every 15-30 seconds, depending on your scrape interval configuration in Prometheus. For more real-time monitoring, you can decrease this interval, but be aware of the increased storage requirements.

Are there any drawbacks to tracking this metric?

The main drawback is that as a counter, container_cpu_usage_seconds_total doesn't handle container restarts well. When a container restarts, the counter resets to zero, which can cause spikes in rate() calculations. Use recording rules or longer time windows to mitigate this.

How can I correlate CPU usage with application performance?

The gold standard is to track both CPU metrics and application-level metrics (like request latency or throughput) on the same timeline. Many teams create ratio metrics like "CPU seconds per request" to understand efficiency:

sum(rate(container_cpu_usage_seconds_total{app="web"}[5m])) / sum(rate(http_requests_total{app="web"}[5m]))

What's a healthy CPU utilization target for containers?

Most teams aim for 50-70% average utilization. This provides good resource efficiency while leaving headroom for traffic spikes. Mission-critical services might target lower utilization (30-50%) for more headroom, while batch jobs can safely run at higher utilization (70-90%).

How do I differentiate between "good" high CPU usage and "bad" high CPU usage?

"Good" high CPU usage correlates with high throughput or expected workloads. "Bad" high usage happens during low traffic periods or causes latency increases. Compare your CPU metrics with business metrics like requests per second to spot the difference.

Can container_cpu_usage_seconds_total help detect application memory leaks?

Indirectly, yes. Memory leaks often cause increased garbage collection activity, which manifests as higher CPU usage over time without corresponding traffic increases. Track the relationship between memory growth and CPU usage patterns to spot potential memory issues.