Kubernetes QoS Explained: Classes & Resource Management

If you’ve worked with Kubernetes, you know it’s designed to manage containers seamlessly. But running containers isn’t just about deployment—it’s about ensuring each workload gets the right resources at the right time.

This is where Quality of Service (QoS) plays a crucial role, helping balance resource allocation and maintain system stability. Kubernetes QoS helps manage resource allocation, ensuring critical applications stay responsive even when resources are tight.

Let’s break down how Kubernetes handles QoS, what it means for your workloads, and why you should care about setting resource requests and limits properly.

The Role of QoS in Kubernetes Clusters

Kubernetes ensures applications run smoothly by managing resources effectively within a namespace. Without proper resource allocation, one Kubernetes pod could consume excessive CPU or memory, affecting the performance of other workloads.

Kubernetes prevents this issue by using QoS (Quality of Service) classes, which prioritize workloads and prevent resource contention. Each Kubernetes pod runs in a namespace, and its resource behavior is influenced by settings such as memory request and memory limit. Understanding these settings is crucial for ensuring fair distribution and maintaining performance.

💡

For a deeper understanding of how different pod types function in Kubernetes, check out our guide on Types of Pods in Kubernetes.

How Resource Requests and Limits Work in a Kubernetes Pod

Every Kubernetes pod defines resource configurations at the container level. Two critical settings dictate how the pod interacts with cluster resources:

Requests: The minimum guaranteed amount of CPU and memory requests allocated to a container.
Limits: The maximum amount of CPU and memory limit a container can use.

For example, if you deploy an nginx container within a Kubernetes pod using Docker, you might specify:

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  namespace: default
spec:
  containers:
  - name: nginx-container
    image: nginx
    resources:
      requests:
        memory: "256Mi"
        cpu: "250m"
      limits:
        memory: "512Mi"
        cpu: "500m"

The metadata section provides identifying details, such as the pod’s name and namespace.
The apiVersion specifies which Kubernetes API to use.
The requests define the minimum guaranteed memory request (256Mi) and CPU (250m).
The limits cap resource usage with a memory limit of 512Mi and a CPU limit of 500m.

How QoS Classes Affect Scheduling and Performance

Kubernetes assigns one of three QoS classes based on how requests and limits are set:

Guaranteed: If all containers in a pod have equal memory requests and memory limits, the pod gets guaranteed resources. Ideal for critical workloads like financial transactions.
Burstable: If a pod has a memory request but a higher memory limit, it gets priority when resources are available but might experience throttling under heavy load.
Best-Effort: If no requests or limits are set, the pod runs only if spare resources exist, making it susceptible to eviction.

For workloads running on AWS, proper QoS settings ensure stable performance, preventing expensive compute nodes from being overutilized.

💡

To learn how to interact with running pods for debugging and management, check out our guide on Pod Exec in Kubernetes.

What Are QoS Classes and Why Do They Matter?

Kubernetes uses Quality of Service (QoS) classes to manage resource allocation and prioritize pods based on their CPU and memory settings. These classifications help Kubernetes determine which pods should receive guaranteed resources and which ones can be deprioritized or evicted when the cluster is under heavy load.

The three QoS classes in Kubernetes are:

Guaranteed
Burstable
BestEffort

Each class influences how Kubernetes schedules workloads and handles resource shortages, ensuring critical applications remain operational.

Comparing Guaranteed, Burstable, and BestEffort QoS Classes

Here’s a quick comparison table summarizing the differences between Kubernetes QoS classes:

QoS Class	Resource Requests Set?	Resource Limits Set?	Priority Level	Eviction Risk	Ideal Use Case
Guaranteed	Yes (equal to limits)	Yes	Highest	Least likely	Critical applications that must always run
Burstable	Yes	Yes (higher than requests)	Medium	Can be evicted under pressure	Applications that need guaranteed minimum resources but can scale
BestEffort	No	No	Lowest	First to be evicted	Non-essential or background workloads

Guaranteed: Highest Stability and Priority

Pods assigned to the Guaranteed QoS class have the strongest resource guarantees. To qualify, a pod must have identical requests and limits for both CPU and memory across all its containers. These pods are the most stable and are only evicted as a last resort.

Burstable: Flexible but Not Fully Guaranteed

Burstable pods define resource requests but allow for higher limits. This means they get a reserved amount of CPU and memory but can consume extra resources when available. However, during resource contention, they may be throttled or evicted in favor of Guaranteed pods.

BestEffort: Lowest Priority and Most Likely to Be Evicted

Pods that do not specify any CPU or memory requests and limits fall into the BestEffort category. These pods receive resources only if there is excess capacity and are the first to be evicted when resources become constrained.

💡

Understanding the difference between pods and nodes is crucial for efficient resource management. Learn more in our guide on Kubernetes Pods vs. Nodes.

How QoS Classes Affect Resource Allocation and Pod Performance

The assigned QoS class directly impacts how Kubernetes schedules pods and distributes CPU and memory:

Guaranteed pods always receive their requested resources and remain stable even under high cluster load.
Burstable pods can use additional resources if available but may experience throttling or eviction when the system is stressed.
BestEffort pods rely entirely on leftover resources and face the highest eviction risk.

How Burstable and BestEffort Pods Handle Resource Allocation in Kubernetes

Burstable Pods: Balancing Stability with Resource Flexibility

Burstable pods offer a middle ground between strict resource guarantees and efficient utilization. When a pod is assigned a CPU request of 100m (0.1 CPU) but a limit of 500m, it will typically run at 100m but can temporarily “burst” up to 500m if extra resources are available.

However, in a high-demand scenario where the cluster is running low on resources, Kubernetes may take corrective actions:

If the pod is exceeding its requested resources, it may be throttled to avoid overuse.
If resource pressure increases, it might be evicted in favor of Guaranteed pods, which have higher priority.

This makes Burstable pods ideal for applications that need a baseline level of resources but can take advantage of extra capacity when available.

BestEffort Pods: The Lowest-Priority Workloads in Kubernetes

BestEffort pods do not define any CPU or memory requests or limits, giving them maximum scheduling flexibility but zero resource guarantees. Kubernetes places them wherever resources are available, but they are the first to be removed when the cluster runs out of capacity.

Key characteristics of BestEffort pods:

They run on leftover resources, making them useful for background tasks or non-essential workloads.
Since they have no reserved CPU or memory, they can be evicted at any time if the cluster becomes constrained.
They are not suitable for critical applications that require stable performance.

How Kubernetes Handles Resource Contention and Pod Evictions

When a Kubernetes cluster is under resource pressure—such as running out of memory—pods are evicted in priority order:

BestEffort pods are removed first because they have no guaranteed CPU or memory allocation.
Burstable pods may be throttled or evicted if they are consuming more than their requested resources.
Guaranteed pods are protected and will only be evicted as a last resort.

This priority system ensures that essential applications continue running while lower-priority workloads are scaled-down when necessary.

💡

To see how Kubernetes manages CPU limits and prevents resource overuse, check out our guide on Kubernetes CPU Throttling.

How Kubernetes Uses DNS for Service Name Resolution

Kubernetes has a built-in DNS service that automatically assigns domain names to services, resolving them to their corresponding IP addresses. This enables seamless communication between pods using service names rather than static IPs, which can change frequently.

For example, if a service named my-app exists in the default namespace, any pod in the cluster can reach it using:

ping my-app.default.svc.cluster.local

Kubernetes DNS ensures that applications can dynamically discover and communicate with each other, reducing the need for manual configuration.

How Kubernetes Services Enable Internal Pod Communication

Each Kubernetes service receives a DNS entry by default, making it easier for pods to locate and connect. This is particularly useful for dynamic workloads where pod IPs change regularly.

Key advantages of service-based communication:

Services abstract pod IP changes, ensuring stability in communication.
Pods can reference services by name (e.g., database-service) instead of relying on changing IP addresses.
Load balancing can be applied at the service level, distributing traffic among multiple pods.

This system simplifies inter-pod networking, especially for applications running as microservices.

Can You Ping a Kubernetes Service?

While it is possible to ping a Kubernetes service, it’s not always a reliable method for checking availability. Many services use load balancers or proxies that do not respond to ICMP (ping) requests.

Better alternatives for service connectivity testing:

curl – Useful for testing HTTP-based services:

curl http://my-service.default.svc.cluster.local:8080

nc (netcat) – Tests if a service is listening on a port:

nc -zv my-service.default.svc.cluster.local 8080

kubectl port-forward – Allows local testing of a service:

kubectl port-forward svc/my-service 8080:80
curl http://localhost:8080

These methods provide more accurate insights into whether a service is reachable and functioning as expected.

💡

For insights on monitoring and troubleshooting Kubernetes workloads, check out our guide on Kubernetes Observability with OpenTelemetry Operator.

How Priority Classes Influence Pod Scheduling and Resource Allocation

Kubernetes uses PriorityClasses to determine which pods should be scheduled first and which should be evicted when resources are constrained. Higher-priority pods take precedence over lower-priority ones, ensuring that critical workloads receive the necessary CPU and memory.

Key aspects of PriorityClasses:

Higher priority values mean better scheduling preference and lower eviction risk.
If a cluster runs out of resources, Kubernetes will preempt (evict) lower-priority pods to make room for high-priority workloads.
PriorityClasses help prevent resource starvation for essential applications in multi-tenant environments.

For example, a database pod might have a higher priority than a batch processing pod, ensuring that essential data services remain operational during resource pressure.

Best Practices for Managing Critical Workloads with Guaranteed QoS

Mission-critical workloads require a stable resource allocation to prevent disruptions. To ensure these workloads run smoothly, consider the following strategies:

Use Guaranteed QoS

Set equal CPU and memory requests and limits for the pod.
Example configuration:

resources:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Assign a High PriorityClass

Define a PriorityClass with a high value:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 100000
preemptionPolicy: PreemptLowerPriority

Enforce Resource Quotas

Use ResourceQuotas to limit non-essential workloads from consuming excessive CPU and memory:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
spec:
  hard:
    cpu: "10"
    memory: "20Gi"

This ensures that high-priority applications always have reserved resources available.

Techniques for Monitoring and Managing Resource Pressure in Kubernetes

When dealing with resource constraints, it's important to actively monitor and adjust cluster behavior. Here are key techniques using kubectl and kubelet:

1. Identify Resource Usage Trends

Check overall node resource consumption:

kubectl describe node <node-name>

List pods consuming the most resources:

kubectl top pods --sort-by=cpu
kubectl top pods --sort-by=memory

2. Adjust Kubelet Eviction Policies

If a node is under heavy load, kubelet may evict pods to free up resources. You can modify eviction policies to control this behavior:

Set soft eviction thresholds to allow some buffer before evictions occur:

eviction-soft:
  memory.available: "500Mi"
eviction-soft-grace-period:
  memory.available: "1m"

Use hard eviction thresholds to enforce immediate evictions when critical limits are reached:

eviction-hard:
  memory.available: "100Mi"

These settings allow better control over when and how Kubernetes decides to evict pods.

💡

For a quick reference on essential Kubernetes commands, check out our Kubectl Commands Cheatsheet.

Conclusion:

Effective Kubernetes QoS and resource management prevent resource contention issues and ensure smooth operation under pressure. This way by implementing PriorityClasses, Guaranteed QoS, and proactive monitoring, you can:

Ensure mission-critical applications remain operational
Prevent unnecessary pod evictions and resource starvation
Optimize resource utilization across different workloads

💡

And if you’d like to continue the discussion, our Discord community is open! We have a dedicated channel where you can connect with other developers and talk about your specific use case.

FAQs

1. What is Kubernetes QoS (Quality of Service) and why is it important?

Kubernetes QoS is a classification system that determines how pods are prioritized for resource allocation under contention. It ensures that critical workloads receive the necessary CPU and memory resources while preventing less important workloads from consuming excessive resources.

2. What are the different QoS classes in Kubernetes?

Kubernetes has three QoS classes:

Guaranteed: Assigned when both CPU and memory requests equal limits for all containers in a pod. Ensures high priority in resource allocation.
Burstable: Assigned when at least one container in a pod has requests lower than limits. It gets prioritized over BestEffort but is not as protected as Guaranteed.
BestEffort: Assigned when no requests or limits are specified. It has the lowest priority and is the first to be throttled during resource contention.

3. How does Kubernetes assign QoS classes to pods?

Kubernetes automatically assigns a QoS class based on the resource requests and limits specified for each container within a pod. The pod gets the lowest common denominator among all containers.

4. How does QoS affect pod scheduling and eviction?

Guaranteed pods are the least likely to be evicted and receive stable resource allocations.
Burstable pods may be evicted if resources are scarce, but only after BestEffort pods.
BestEffort pods are the most likely to be evicted first when the node experiences resource pressure.

5. Can a pod's QoS class change dynamically after deployment?

No, once a pod is assigned a QoS class at creation, it remains unchanged throughout its lifecycle, even if resource limits or requests are modified in a running container. To change the QoS class, you must delete and recreate the pod with new resource settings.

6. How can I check the QoS class of a running pod?

You can check a pod's QoS class by running:

kubectl get pod <pod-name> -o jsonpath='{.status.qosClass}'

7. How do I ensure a pod gets a Guaranteed QoS class?

To achieve Guaranteed QoS, all containers in a pod must have equal and explicit CPU and memory requests and limits. Example YAML:

apiVersion: v1
kind: Pod
metadata:
  name: guaranteed-pod
spec:
  containers:
  - name: app-container
    image: my-app
    resources:
      requests:
        memory: "512Mi"
        cpu: "250m"
      limits:
        memory: "512Mi"
        cpu: "250m"

This ensures the pod gets the highest priority in resource allocation and eviction protection.