Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

How Replicas Work in Kubernetes

Understand how Kubernetes uses replicas to ensure your application stays available, handles traffic spikes, and recovers from pod failures automatically.

Jul 8th, ‘25
How Replicas Work in Kubernetes
See How Last9 Works

Unified observability for all your telemetry. Open standards. Simple pricing.

Talk to us

Replicas in Kubernetes control how many copies of your pods run simultaneously. They're the foundation of scaling, availability, and recovery in your cluster. When you're running a stateless API or a background worker, understanding how replicas work directly impacts your application's reliability and performance.

This blog walks through replica management, from basic concepts to production monitoring patterns that help you maintain healthy, scalable applications.

Kubernetes and Replicas

Kubernetes operates on a control plane that manages worker nodes running your applications. The control plane includes components like the API server, scheduler, and various controllers that maintain your cluster's desired state. Worker nodes host pods, the smallest deployable units in Kubernetes.

Controllers continuously monitor your cluster and take action when the actual state differs from what you've specified. This reconciliation loop is what makes Kubernetes self-healing and reliable.

"Replica" in Kubernetes Context

A replica is simply a copy of a pod. When you specify 3 replicas, Kubernetes ensures 3 identical pods run simultaneously. Each replica shares the same configuration, container images, and resource requirements, but they're separate instances distributed across your cluster.

Replicas can't maintain themselves; they need controllers to monitor and manage them. If you specify 3 replicas but only 2 are running, the controller creates a new pod. If an extra pod appears, it removes one to maintain the exact count.

Importance of Replication for High Availability

Replication provides fault tolerance and load distribution. If one pod crashes, your application continues running on the remaining replicas while the controller creates a replacement. This self-healing behavior means you don't need to manually monitor and replace failed pods.

Multiple replicas also distribute traffic load across instances, preventing any single pod from becoming a bottleneck. Combined with proper resource allocation, replication helps maintain consistent performance under varying loads.

💡
If you're new to Kubernetes, it helps to first understand what a pod is and how it works.

Pods vs. Replicas: Key Differences

A pod represents a single instance of your application. It contains one or more containers that share storage, network, and lifecycle. Pods are ephemeral—they can be created, destroyed, and recreated based on your cluster's needs.

Here's a basic pod definition:

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
spec:
  containers:
  - name: nginx
    image: nginx:1.21
    ports:
    - containerPort: 80

This pod runs a single nginx container. If it fails, it stays failed unless you manually intervene.

The Concept of Replicas in a Cluster

Replicas extend the pod concept to multiple instances. Instead of managing individual pods, you specify how many copies you want, and Kubernetes handles the rest. This abstraction makes scaling and recovery automatic.

When you create replicas, Kubernetes:

  • Creates the specified number of pods
  • Monitors their health continuously
  • Replaces failed pods automatically
  • Maintains the desired count across cluster changes

Pod vs. Replica

AspectSingle PodReplicas
Failure RecoveryManual intervention requiredAutomatic replacement
ScalingManual pod creation/deletionDeclarative scaling
Load DistributionSingle point of failureTraffic spread across instances
Management OverheadHighLow

Replicas transform manual pod management into declarative configuration. You specify the desired state, and Kubernetes maintains it.

💡
If you're unsure how pods relate to the nodes they run on, this guide on Kubernetes Pods vs Nodes breaks it down with simple examples.

What is a Kubernetes ReplicaSet?

A ReplicaSet ensures a specified number of pod replicas are running and provides self-healing capabilities. It's the controller responsible for maintaining the replica count and replacing failed instances.

ReplicaSets use label selectors to identify which pods they manage. This flexible approach allows you to target specific pods while ignoring others that might share similar characteristics.

Specifying Number of Replicas with YAML

Here’s a complete example of a ReplicaSet configuration that ensures your application runs with exactly 3 instances:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: nginx-replicaset
  labels:
    app: nginx
    tier: frontend
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
      tier: frontend
  template:
    metadata:
      labels:
        app: nginx
        tier: frontend
    spec:
      containers:
      - name: nginx
        image: nginx:1.21
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi

Let’s break down the key parts:

  • replicas: 3
    This tells Kubernetes to maintain exactly 3 running pods at all times. If one goes down, the ReplicaSet will spin up another to replace it automatically.
  • selector
    The selector defines which pods this ReplicaSet is responsible for. It matches the pods with the labels app: nginx and tier: frontend.
  • template
    This section defines what a new pod should look like. It includes the metadata and spec for the pod, such as the container image (nginx:1.21), exposed port (80), and resource limits.

This setup ensures your frontend service remains highly available and has controlled resource usage across its replicas.

Interact with ReplicaSets Using kubectl

Once you’ve defined your ReplicaSet in a YAML file (e.g., nginx-replicaset.yaml), here’s how you can manage and interact with it:

1. Create the ReplicaSet

kubectl apply -f nginx-replicaset.yaml

This applies your YAML file and tells Kubernetes to create the ReplicaSet and its associated pods.

2. Check the status

kubectl get replicaset
kubectl describe replicaset nginx-replicaset

Use these commands to confirm the ReplicaSet was created successfully and see details like the number of replicas, labels, and events.

3. View the pods managed by the ReplicaSet

kubectl get pods -l app=nginx

This lists all pods with the label app=nginx, which are managed by the ReplicaSet you just created.

4. Scale the ReplicaSet

kubectl scale replicaset nginx-replicaset --replicas=5

This updates the desired replica count from 3 to 5. The ReplicaSet will create 2 additional pods to match the new count.

5. Test self-healing by deleting a pod

kubectl delete pod <pod-name>
kubectl get pods -l app=nginx

When you delete a pod, the ReplicaSet detects the change and automatically creates a new one to maintain the desired number of replicas. This is how Kubernetes ensures high availability.

These commands make it easy to manage ReplicaSets and verify that Kubernetes is doing its job, keeping your app running exactly the way you defined it.

💡
Need a refresher on commands? Here’s a handy kubectl cheat sheet that covers everything from listing pods to scaling replicas.

Role of ReplicaSets in Load Balancing and High Availability

ReplicaSets enable horizontal scaling by running multiple pod instances. When combined with Kubernetes Services, traffic distributes across all healthy replicas, preventing any single instance from becoming overwhelmed.

For high availability, spread replicas across multiple nodes:

spec:
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - nginx
            topologyKey: "kubernetes.io/hostname"

This configuration ensures replicas run on different nodes, protecting against single-node failures.

ReplicaSet vs. Deployment: What’s the Difference?

A ReplicaSet keeps a fixed number of pods running. A Deployment does that too, but adds features like rolling updates, rollbacks, and version tracking. You’ll rarely use a ReplicaSet directly in production because Deployments handle all that for you.

Why Deployments Exist

Back in the early days of Kubernetes, ReplicaSets were the go-to for managing pod replicas. But they were a bit low-level. If you wanted to update an app version or roll something back, you had to do it manually.

Deployments (introduced in Kubernetes 1.2) built on top of ReplicaSets, offering a higher-level abstraction that does the heavy lifting—updates, rollbacks, and orchestration—automatically.

Comparing Configurations

The YAML for ReplicaSets and Deployments looks almost identical. Both use the apps/v1 API and a pod template under spec.template. But what happens when you change something like the image version is where Deployments shine.

ReplicaSet YAML

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: api-replicaset
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
      - name: api
        image: myapp:v1.0.0

Deployment YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
      - name: api
        image: myapp:v1.0.0

Pretty similar on the surface. But a Deployment adds lifecycle management to that structure.

Rolling Updates and Rollbacks

Let’s say you want to update your app to a new version:

kubectl set image deployment/api-deployment api=myapp:v1.1.0

What this does behind the scenes:

  • Spins up a new ReplicaSet with the updated image
  • Gradually shifts traffic from the old pods to the new ones
  • Makes sure there’s no downtime during the transition

If something goes wrong, rolling back is just as simple:

kubectl rollout undo deployment/api-deployment

And to see what’s happening during a rollout:

kubectl rollout status deployment/api-deployment

So, When Should You Use What?

Use Deployments if:

  • Your app will be updated over time
  • You want rolling updates without downtime
  • You need rollback support or version history
  • You’re running production workloads

Use ReplicaSets directly if:

  • You’re building a demo or learning how things work
  • Your app doesn’t change often (or ever)
  • You’re handling updates manually or using custom logic

In most cases, Deployments are the right choice. They abstract away a lot of boilerplate and let you focus on shipping and maintaining apps, not on manually managing pod lifecycles.

💡
If you want to check resource usage for your pods in real time, this guide on using kubectl top walks you through the process.

How to Work with ReplicaSets

ReplicaSets ensure a specific number of pod replicas are always running. Here’s how to create, scale, and manage them using kubectl.

1. Create a ReplicaSet

Start by defining a basic ReplicaSet in a file called web-replicaset.yaml:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: web-app
  labels:
    app: web
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: web
        image: nginx:1.21
        ports:
        - containerPort: 80

Apply the config:

kubectl apply -f web-replicaset.yaml

Check that the ReplicaSet and pods were created:

kubectl get replicaset web-app
kubectl get pods -l app=web

2. Update Replica Count

Change the number of replicas in the YAML file, then reapply it:

# Change replicas: 2 → replicas: 4
kubectl apply -f web-replicaset.yaml

3. Scale with kubectl scale

You can also scale directly from the CLI:

Scale back down to 2:

kubectl scale replicaset web-app --replicas=2

Watch the new pods spin up:

kubectl get pods -l app=web -w

Scale up to 6 replicas:

kubectl scale replicaset web-app --replicas=6

4. Test Self-Healing

Delete a pod manually:

kubectl delete pod $(kubectl get pods -l app=web -o jsonpath='{.items[0].metadata.name}')

Then check:

kubectl get pods -l app=web

The ReplicaSet immediately creates a new pod to maintain the target count.

5. Monitor and Manage Pods

Check resource usage (requires metrics-server):

kubectl top pods -l app=web

View pod logs:

kubectl logs -l app=web --tail=50

List pods it manages:

kubectl get pods -l app=web -o wide

Describe a specific ReplicaSet:

kubectl describe replicaset web-app

List all ReplicaSets:

kubectl get replicaset
💡
If you're trying to debug or inspect your pods, this guide on using kubectl logs shows you how to view logs effectively.

Best Practices for Using ReplicaSets in Cloud-Native Environments

Resource Management: Always specify resource requests and limits:

spec:
  template:
    spec:
      containers:
      - name: app
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi

Health Checks: Configure readiness and liveness probes:

spec:
  template:
    spec:
      containers:
      - name: app
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10

Pod Disruption Budgets: Protect against excessive pod termination during maintenance:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: web

Monitoring and Alerting: Set up monitoring for replica health:

# Prometheus alerting rule example
groups:
- name: replica-health
  rules:
  - alert: ReplicaSetDown
    expr: kube_replicaset_status_ready_replicas < kube_replicaset_spec_replicas
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "ReplicaSet {{ $labels.replicaset }} has fewer ready replicas than desired"
💡
To set up alerting and track replica health, our Prometheus Alerting blog helps you with examples you can easily use!

Conclusion

Replicas help ensure your app stays available and can handle traffic. But in production, you need more than just a running ReplicaSet; you need to know if pods are restarting, if replicas match the desired count, and if resource limits are being hit.

Last9 helps you track this. It connects to your Kubernetes cluster, shows replica status in real time, and integrates with Prometheus to alert you when something breaks. No extra agents or custom setup required.

Get started for free today!

Additional Resources: Kubernetes Documentation and Tutorials

FAQs

What is a replica in Kubernetes?

A replica in Kubernetes is a copy of a pod that runs the same application instance. When you specify 3 replicas, Kubernetes creates and maintains 3 identical pods with the same configuration, container images, and resource requirements, but they're separate instances distributed across your cluster.

What is the difference between a pod and a replica?

A pod is a single instance of your application containing one or more containers. A replica refers to multiple copies of that pod. Pods are ephemeral and require manual intervention if they fail, while replicas are managed by controllers that automatically replace failed instances and maintain the desired count.

What is a replica in a cluster?

A replica in a cluster is an identical copy of a pod that runs across different nodes in your Kubernetes cluster. Replicas provide fault tolerance and load distribution—if one node fails, your application continues running on replicas located on other nodes while the controller creates replacements.

What is the difference between a ReplicaSet and a Deployment?

A ReplicaSet manages replica count directly and ensures a specified number of pods are running. A Deployment is a higher-level abstraction that manages ReplicaSets for you, adding rolling update capabilities, rollback functionality, and revision history. Deployments create and manage ReplicaSets automatically.

What is the difference between Deployment and ReplicaSet?

Deployments provide rolling updates, rollbacks, and revision history while managing ReplicaSets behind the scenes. ReplicaSets only maintain replica count without update capabilities. For production workloads, use Deployments; for simple scenarios that never change, ReplicaSets work fine.

What is Kubernetes ReplicaSet?

A Kubernetes ReplicaSet is a controller that ensures a specified number of pod replicas are running at any given time. It monitors pod health, automatically replaces failed instances, and maintains the desired replica count. ReplicaSets use label selectors to identify which pods they manage.

What Is Kubernetes?

Kubernetes is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. It provides features like service discovery, load balancing, storage orchestration, automated rollouts, and self-healing capabilities across clusters of machines.

What is the purpose of setting the container port field?

The container port field informs Kubernetes which port your container listens on. While it's mainly documentation, it helps with service discovery, monitoring tools, and network policies. Services use this information to route traffic correctly to your pods.

Can a deployment kind with a replica count = 1 ever result in two Pods in the 'Running' phase?

Yes, temporarily during rolling updates. When you update a Deployment's image or configuration, Kubernetes creates a new pod before terminating the old one to maintain availability. Both pods may exist briefly in the 'Running' phase until the old pod terminates.

Why Do You Need StatefulSets in Kubernetes?

StatefulSets manage stateful applications that need persistent identity, ordered deployment, and stable network identities. Unlike ReplicaSets, StatefulSets provide predictable pod names, persistent storage that survives pod restarts, and ordered scaling—essential for databases and clustered applications.

What Is Kubernetes Deployment And How To Use It?

A Kubernetes Deployment manages ReplicaSets and provides declarative updates to pods. Create a Deployment with kubectl apply -f deployment.yaml, update with kubectl set image, and manage rollouts with kubectl rollout status. Deployments handle rolling updates, rollbacks, and scaling automatically.

How do replicas ensure high availability in Kubernetes?

Replicas ensure high availability by distributing multiple pod instances across cluster nodes. If one pod or node fails, traffic continues flowing to healthy replicas while controllers automatically create replacements. This redundancy prevents single points of failure.

How does Kubernetes ensure high availability with replicas?

Kubernetes ensures high availability by continuously monitoring replica health and automatically replacing failed instances. Controllers maintain the desired replica count, spread pods across nodes using anti-affinity rules, and integrate with Services to distribute traffic only to healthy replicas.

How does Kubernetes handle scaling with replicas?

Kubernetes handles scaling by adjusting the replica count in ReplicaSets or Deployments. Manual scaling uses kubectl scale, while Horizontal Pod Autoscaler (HPA) automatically scales based on CPU, memory, or custom metrics. Controllers create or terminate pods to match the desired replica count.

How does the replica count in a Kubernetes Deployment affect application scalability?

Higher replica counts increase application capacity by distributing load across more instances. Each replica can handle requests independently, so scaling from 2 to 6 replicas theoretically triples capacity. However, you must consider resource limits, database connections, and external dependencies.

How does Kubernetes manage replicas to ensure high availability?

Kubernetes manages replicas through controllers that continuously reconcile the desired state with actual state. Controllers monitor pod health, automatically replace failed instances, distribute replicas across nodes, and coordinate with Services to route traffic only to healthy pods, maintaining availability during failures and updates.

Contents

Do More with Less

Unlock high cardinality monitoring for your teams.