APM for Kubernetes: Monitor Distributed Applications at Scale

When a payment service runs across 12 pods — each serving different customer segments — and an authentication layer spans three namespaces, performance issues can originate in both the application code and the orchestration layer.

The challenge is linking request-level performance data with what’s happening inside the cluster: container CPU limits, pod scheduling decisions, and node-level events. If latency climbs after a deployment, the cause could be slower code, CPU throttling, or a load balancer sending traffic to containers that are still warming up.

APM in Kubernetes brings these layers together, showing how application behavior, resource constraints, and cluster activity interact — so performance issues can be understood and addressed faster.

Why Standard APM Falls Short in Container Orchestration

Most APM tools were built for environments where applications live on fixed hosts with static IPs and predictable resource allocations. Kubernetes changes those assumptions at a fundamental level.

In a typical cluster, a pod running auth-service might be deployed on node-a today, get evicted tomorrow because of a node drain, and come back on node-c with a new IP and hostname. Horizontal Pod Autoscaling might spin up three new replicas during peak traffic, each with different CPU and memory requests based on current cluster headroom. These changes happen in seconds, faster than many legacy APMs can reconcile their topology maps.

This makes context tracking difficult. Imagine a checkout trace that flows like this:

Ingress Controller — running on node-1, terminates TLS and routes traffic.
Auth Service — pod auth-7f8d9 on node-3 verifies the customer session.
Payment Processor — pod pay-3a2b1 on node-1 interacts with the payment gateway.

A standard APM might only record the service names and timestamps. Without Kubernetes-aware metadata, pod UID, namespace, node name, container resource usage at the time, you lose the ability to connect a 400 ms spike in payment latency to the fact that the pod was CPU-throttled after being rescheduled to a node with tighter limits.

Networking adds even more moving parts:

Service Mesh Overhead — Sidecars like Envoy can add 5–15 ms to every request hop when retries, TLS handshakes, or circuit-breaking logic kick in.
Cluster DNS — Core DNS performance can fluctuate if its pods are co-located on a busy node, causing unpredictable lookup times.
Network Policies — Dropped packets between namespaces may never surface as application errors, only as timeouts in traces.

Without instrumentation that merges application traces with real-time Kubernetes telemetry — pod lifecycle events, container stats, and CNI-level metrics — these infrastructure-driven slowdowns look like random application lag.

💡

Learn the fundamentals of application performance monitoring and how it applies to modern systems in our APM guide.

How to Add Application Monitoring to Kubernetes Pods

You don’t have to modify your entire codebase to start tracking performance in Kubernetes. With the right tooling, you can attach monitoring to your pods automatically, capture the context you need, and send it to your observability backend without disrupting deployments.

One of the simplest approaches is to use the OpenTelemetry Operator. It watches for deployments that have specific annotations and then automatically injects instrumentation sidecars or agents. For example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-processor
  namespace: payments
spec:
  template:
    metadata:
      annotations:
        sidecar.opentelemetry.io/inject: "true"
        instrumentation.opentelemetry.io/inject-python: "true"
    spec:
      containers:
      - name: payment
        image: payment-service:v2.1.4
        env:
        - name: OTEL_SERVICE_NAME
          value: "payment-processor"
        - name: OTEL_RESOURCE_ATTRIBUTES
          value: "service.namespace=payments,service.version=v2.1.4"

With this in place:

The operator detects the annotations.
It patches the pod spec to add the OpenTelemetry agent.
Exporters are configured automatically, and traces, metrics, and logs start flowing.

You can also add Kubernetes metadata to every metric and trace to make troubleshooting easier:

env:
- name: K8S_POD_NAME
  valueFrom:
    fieldRef:
      fieldPath: metadata.name
- name: K8S_NODE_NAME
  valueFrom:
    fieldRef:
      fieldPath: spec.nodeName
- name: K8S_NAMESPACE
  valueFrom:
    fieldRef:
      fieldPath: metadata.namespace

This extra context lets you quickly tell if a performance issue is tied to a specific pod, node, or namespace.

If you’re running a service mesh like Istio, you can take it further by combining application-level traces with Envoy’s network telemetry. This gives you hop-by-hop latency, retries, and policy-triggered failures — all without touching your application code.

Key Metrics to Know for Kubernetes APM

In Kubernetes, useful APM comes from looking at both application behavior and the cluster conditions around it. One without the other often leaves gaps in the analysis.

Application metrics

Latency percentiles (P50, P95, P99) – Median latency reflects typical requests, while higher percentiles capture the slowest ones.
Error rates – Tracking 4xx and 5xx responses helps identify functional or dependency issues.
Throughput – Requests per second give a sense of demand patterns over time.
SLO compliance – Where service-level objectives exist, measuring them directly offers a clear view of performance.

Cluster and pod metrics

CPU usage and throttling – Shows whether workloads are hitting CPU limits.
Memory usage and pressure – Useful for spotting spikes or patterns that align with garbage collection or container restarts.
Pod restarts – Can indicate OOM kills, crash loops, or failing health checks.
Network latency and errors – Particularly relevant for cross-namespace or cross-cluster communication.
Deployment events – Helpful for seeing if a change in version aligns with a shift in performance.

When application metrics and cluster metrics are viewed together, patterns become easier to interpret.

A rise in P99 latency with increased CPU throttling can point to resource constraints.
Elevated error rates soon after a deployment may highlight regressions.
Reduced throughput alongside rising memory usage could suggest a leak or inefficient resource handling.

In Last9, Kubernetes metadata — pod name, namespace, node, and deployment version — is attached to every signal. This makes it straightforward to filter, group, and compare metrics without additional tagging work.

💡

Know how adding logs to your APM setup can speed up debugging in this helpful blog post on APM logs.

Enable APM in Dynamic Kubernetes Environments

Because Kubernetes workloads constantly shift, monitoring has to adapt without manual intervention. Pod IPs change, new services appear, deployments roll out multiple times a day — instrumentation can’t rely on static configs.

One of the most reliable ways to handle this is by integrating directly with the Kubernetes API. By watching for new deployments and updates, monitoring agents can automatically attach themselves to workloads as they appear.

OpenTelemetry Operators make this possible. For example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-processor
  namespace: payments
spec:
  template:
    metadata:
      annotations:
        sidecar.opentelemetry.io/inject: "true"
        instrumentation.opentelemetry.io/inject-python: "true"
    spec:
      containers:
      - name: payment
        image: payment-service:v2.1.4
        env:
        - name: OTEL_SERVICE_NAME
          value: "payment-processor"
        - name: OTEL_RESOURCE_ATTRIBUTES
          value: "service.namespace=payments,service.version=v2.1.4"

Here’s what happens:

The operator detects the annotations on the Deployment.
It patches the pod spec to include the OpenTelemetry sidecar or agent.
It configures exporters and starts streaming traces, metrics, and logs.

No changes are needed to the application code or container image — instrumentation is injected at runtime.

For custom business metrics, application-level instrumentation is still required, but Kubernetes metadata can be added automatically to provide richer context:

env:
- name: K8S_POD_NAME
  valueFrom:
    fieldRef:
      fieldPath: metadata.name
- name: K8S_NODE_NAME
  valueFrom:
    fieldRef:
      fieldPath: spec.nodeName
- name: K8S_NAMESPACE
  valueFrom:
      fieldPath: metadata.namespace

These attributes become part of your metric labels, making it easy to filter performance data by namespace, pod, or node. For example, if payment processing latency spikes, you can instantly see whether it’s limited to pods on a single node or affecting the whole service.

You can pair this with service mesh instrumentation for network-level visibility without touching application code. With Istio, the Envoy sidecars automatically capture request/response timings, retries, and circuit breaker activations.

This mesh data complements application traces, helping you spot issues — like network delays or failed retries — that your app logs might never surface.

Link Application Traces to Kubernetes Context

When a request travels through multiple services in different pods, distributed tracing becomes the only way to see the full performance story. The difficulty lies in keeping the trace context intact as it passes through ingress controllers, service meshes, load balancers, and application code.

A real example might look like this:

nginx-ingress (pod-ingress-abc123, node-1) →
auth-service (pod-auth-def456, node-2) →
payment-service (pod-payment-ghi789, node-3) →
database-proxy (pod-db-xyz012, node-2)

Every hop must propagate the trace context while adding its own span data — and that span data should carry Kubernetes-specific metadata. This is what makes it possible to connect a slow payment request to a specific pod that was CPU-throttled or running on an overloaded node.

In most production setups, OpenTelemetry handles this instrumentation and exports traces to an observability backend like Last9, which supports high-cardinality telemetry at scale and correlates traces with metrics and logs. This means a single trace can show both the application delay and the underlying node conditions that caused it.

A common collector configuration for adding Kubernetes metadata looks like this:

processors:
  k8sattributes:
    extract:
      metadata:
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.namespace.name
        - k8s.node.name
    passthrough: false

exporters:
  otlp:
    endpoint: "https://otlp.last9.io"
    headers:
      "last9-cluster": "production-us-west"

Service meshes extend tracing to the network layer. In Istio, Envoy proxies automatically record retries, circuit breaker triggers, and hop-by-hop latency — details your application code doesn’t surface. This makes it possible to distinguish between a payment failure caused by application logic, network connectivity, or mesh policy rules.

Network-level traces often expose problems before they appear in application metrics:

Connection pooling issues — showing up as spikes in connection setup time.
DNS resolution delays — visible as latency between service discovery and the first request.
Mesh configuration errors — detected when policy enforcement blocks traffic unexpectedly.

Application-level spans combined with Kubernetes and mesh metadata give APM tools a complete view of the request path, showing exactly where — and why — performance breaks down.

Correlate Resource Usage with Application Behavior

Kubernetes resource limits can shape application behavior in ways that aren’t obvious from the code. A service may show rising response times without any clear change at the application level — the actual cause could be CPU throttling or memory pressure forcing garbage collection cycles.

The kubelet exposes detailed per-container metrics through the /metrics/cadvisor endpoint. For example:

container_cpu_cfs_throttled_seconds_total{pod="payment-service-ghi789"} 45.7
container_memory_working_set_bytes{pod="payment-service-ghi789"} 1073741824
container_spec_cpu_quota{pod="payment-service-ghi789"} 200000
container_spec_memory_limit_bytes{pod="payment-service-ghi789"} 2147483648

When these are correlated with application telemetry, patterns often emerge:

CPU throttling typically aligns with an increase in tail latency.
Memory pressure in JVM services often leads to more frequent garbage collection, which shows up as periodic latency spikes.

A practical approach is to alert on the combination of application latency and resource pressure. Example Prometheus rule:

groups:
- name: performance-resources
  rules:
  - alert: ServiceSlowWithCPUThrottling
    expr: |
      (
        histogram_quantile(0.95, 
          rate(http_request_duration_seconds_bucket[5m])
        ) > 0.5
      ) and (
        rate(container_cpu_cfs_throttled_seconds_total[5m]) > 0.1
      )
    labels:
      severity: warning
    annotations:
      description: "{{ $labels.pod }} is experiencing slow responses with CPU throttling"

This makes it possible to separate performance issues caused by infrastructure limits from those caused by application logic. If response times climb alongside CPU throttling, the fix may be adjusting resource requests and limits rather than tuning code paths.

Memory metrics often need a more layered view. Working set memory shows active usage, but page fault counts and memory pressure events add important context. In containerized Java workloads, combining heap usage trends with garbage collection pause times often explains latency variation more accurately than looking at memory utilization alone.

💡

If you're curious about how distributed tracing works in Kubernetes environments, check out our overview of APM tracing for a clear, practical explanation.

Multi-Cluster Visibility and Cross-Region Correlation

As your Kubernetes footprint grows, you may run multiple clusters — split by environment, region, or business unit. For APM to stay effective at this scale, you need a single view across all clusters, while still knowing exactly where each workload is running.

Start by adding cluster-specific identifiers to your telemetry stream:

processors:
  resource:
    attributes:
    - key: k8s.cluster.name
      value: "production-us-west"
      action: upsert
    - key: k8s.cluster.region  
      value: "us-west-2"
      action: upsert

exporters:
  otlp:
    endpoint: "https://otlp.last9.io"
    headers:
      "last9-cluster": "production-us-west"

With these attributes, you can:

Filter and group telemetry by cluster and region.
Quickly identify whether a slowdown is isolated to one cluster or impacting multiple environments.

Cross-cluster tracing gets more complex when your services communicate between regions. For example:

A request from your US-West authentication service to your EU-Central user profile service will have a different latency profile than an intra-cluster call.
You’ll face WAN latency, cross-continent TLS handshakes, and the variability of internet routing.

To keep this traceable, maintain consistent propagation headers across clusters and use an observability backend that understands multi-cluster topologies.

A centralized observability platform like Last9 helps you by:

Aggregating data from all clusters in one place.
Preserving original cluster and region metadata.
Allowing you to drill down into incidents without losing context.

Your metrics and alerts should also account for regional differences:

Inter-region calls can vary in latency depending on routing paths — for example, 60 ms in one direction and 85 ms in the other.
Set separate thresholds for:
- Intra-cluster communication
- Inter-cluster (same region)
- Inter-region calls

This approach reduces false positives and helps you respond faster when latency truly deviates from the norm.

Handle High-Cardinality Data and Scale Challenges

In large Kubernetes environments, APM produces massive volumes of high-cardinality telemetry. A single cluster with hundreds of pods can generate millions of unique metric series and thousands of traces per minute. Every pod name, container ID, and node assignment adds new label combinations.

High-cardinality data turns “payment processing is slow” into “payment processing is slow for premium customers using stored payment methods on mobile devices in the EU region.” That level of detail is valuable, but it also places a heavy load on your monitoring infrastructure.

To keep trace data manageable without losing important details, many teams use tail-based sampling:

processors:
  tail_sampling:
    policies:
      - name: errors_and_slow
        type: and
        and:
          and_sub_policy:
            - name: errors
              type: status_code  
              status_code: {status_codes: [ERROR]}
            - name: slow_requests
              type: latency
              latency: {threshold_ms: 1000}
      - name: random_sample
        type: probabilistic
        probabilistic: {sampling_percentage: 1}

This allows you to:

Capture complete traces for errors and slow requests.
Aggressively sample routine requests to control storage and processing costs.

Reduce Metric Cardinality with Aggregation

Kubernetes labels can easily create millions of unique time series if left unchecked. Pod names and container IDs are especially problematic because they change for every instance.

A common solution is to pre-aggregate high-cardinality metrics using recording rules:

groups:
- name: k8s_aggregation
  interval: 30s
  rules:
  - record: k8s:container_cpu_usage:rate5m
    expr: |
      sum(rate(container_cpu_usage_seconds_total[5m])) 
      by (namespace, deployment, container)

This approach:

Produces lower-cardinality metrics for dashboards and alerting.
Preserves raw high-cardinality data for detailed investigations.

Match Data Detail to the Use Case

Different scenarios require different levels of granularity:

Dashboards and alerts → pre-aggregated metrics for speed and efficiency.
Incident investigations → raw high-cardinality data to pinpoint exact issues.

Last9 supports both without forcing you to choose between rich data and system performance.

Correlate Performance with Deployment Events

Kubernetes deployments can happen often — sometimes multiple times a day. To identify issues quickly, your APM system should connect performance changes with deployment events so you can confirm whether new code is introducing problems.

You can do this by including deployment metadata in your telemetry data:

labels:
  app.kubernetes.io/version: "v2.1.4"
  deployment.kubernetes.io/revision: "7"

With this information in place, you can:

See if performance degradation lines up with a recent deployment.
Add deployment markers to dashboards for a visual correlation.

Handle Mixed-Version States

Rolling updates introduce a period where both old and new versions handle traffic. During this time, metrics need to be version-aware so you can see whether problems affect all traffic or just the new release.

Example Prometheus rule for alerting on errors in a new version:

groups:
- name: deployment_monitoring
  rules:
  - alert: NewVersionErrors
    expr: |
      sum(rate(http_requests_total{code=~"5.."}[5m])) 
      by (app_version) > 0.01
    for: 2m
    labels:
      severity: critical
    annotations:
      description: "Version {{ $labels.app_version }} showing elevated error rates"

Canary and Rollback Tracking

Some teams use canary deployments to route a small portion of traffic to the new version while monitoring its performance separately from stable traffic. This requires more advanced routing and metric labeling but allows you to catch deployment issues before they affect all users.

Rollback tracking is just as important. When you revert to a previous version, you should confirm that key performance metrics return to baseline. Automated rollback systems can use APM data as a trigger, rolling back automatically if performance does not recover within defined thresholds.

💡

Track Kubernetes APM signals in real time with Last9 MCP, bringing live pod metrics, traces, and logs into your IDE to troubleshoot faster.

How Last9 Fits Into Your Kubernetes APM

Kubernetes changes the way APM works. You’re not just looking at function calls and API latency — you’re dealing with pods that move between nodes, workloads that scale up or down in minutes, and deployments that ship several times a day.

The techniques we covered need a backend that’s built for this reality. Last9 gives you that foundation.

Kubernetes dashboard – See every deployment, namespace, and pod alongside CPU, memory, network, and storage metrics. If a trace shows payment requests slowing down, you can open the exact pod to check for CPU throttling or recent rescheduling.
Kubernetes metadata in every signal – Pod name, namespace, and node are attached to traces, metrics, and logs automatically, so you can connect application performance with cluster activity without extra queries.
Deployment-aware views – Compare performance before and after a rollout, or isolate errors to only the new version of a service during a mixed-version state.
Multi-cluster and cross-region context – Telemetry carries cluster and region attributes, so you can filter to “production-us-west” or correlate a slowdown in “eu-central” to inter-region network latency.
High-cardinality data at scale – Keep the pod, node, and namespace labels you need for debugging without hitting performance or storage limits.
Streaming aggregation and filtering – Pre-aggregate for dashboards and alerts while retaining raw data for investigations — all without redeploying workloads.
MCP agent – Pull real-time production traces and metrics into your IDE for faster debugging with full context.

With these capabilities, you can keep the depth Kubernetes demands while preserving the speed and context you need to fix problems before users notice.

Book sometime with us or start for free today!

FAQs

Q: What is APM in Kubernetes?
A: APM in Kubernetes monitors application performance in containerized environments where services run across multiple pods, nodes, and namespaces. Unlike traditional APM, it correlates application metrics like response times and error rates with container resource usage, pod scheduling decisions, and cluster state changes to provide complete visibility into distributed systems.

Q: What is the best tool to monitor a Kubernetes cluster?
A: Last9 provides managed observability that handles high-cardinality telemetry at scale and integrates with OpenTelemetry and Prometheus for unified metrics, logs, and traces. The best approach typically combines multiple tools: Prometheus for metrics collection, OpenTelemetry for distributed tracing, and a platform like Last9 for correlation and analysis across the entire stack.

Q: How to monitor a healthy application in Kubernetes?
A: Focus on service-level indicators: latency percentiles (P50, P95, P99), error rates, request throughput, and resource utilization. Use Kubernetes readiness and liveness probes for pod health, implement distributed tracing for request flows across services, and set up alerts based on service-level objectives rather than individual container metrics.

Q: How will you do monitoring in Kubernetes?
A: Start with cluster-level infrastructure monitoring using metrics-server and kubelet endpoints. Add application instrumentation with OpenTelemetry operators for automatic service discovery and trace collection. Configure metric exporters to send data to your observability platform, and implement recording rules to manage high-cardinality data from pod and container labels.

Q: What is Kubernetes Monitoring?
A: Kubernetes monitoring covers both infrastructure health (cluster state, node resources, pod status) and application performance (service latency, error rates, business metrics). It includes monitoring the control plane components (API server, etcd, scheduler), worker node resources, container resource usage, and distributed application performance across the cluster.

Q: How can you alert on the metrics?
A: Set up multi-dimensional alerting rules that combine application performance with infrastructure state. Use recording rules for complex calculations, implement alert routing based on severity levels, and create alerts that account for Kubernetes dynamics like rolling deployments and pod restarts. Avoid alerting on individual pod metrics; focus on service-level aggregations.

Q: When to use API Gateway, Load Balancer, or Reverse Proxy?
A: API Gateways handle cross-cutting concerns like authentication, rate limiting, request transformation, and API versioning across multiple backend services. Load balancers distribute traffic and provide health checking and failover. Reverse proxies cache responses, terminate SSL, and handle compression. In Kubernetes, ingress controllers often combine these functions, with service meshes adding policy enforcement and observability.

Q: How does APM for Kubernetes help in identifying performance bottlenecks?
A: APM correlates application slowness with infrastructure constraints by tracking distributed traces across pod boundaries while monitoring container resource usage, CPU throttling events, memory pressure, and network latency. When response times spike, you can quickly determine whether the issue stems from application logic, resource limits, inter-service communication, or cluster scheduling decisions.

Q: How does APM improve performance in Kubernetes environments?
A: APM provides visibility into request flows across microservices, helping identify slow dependencies, optimize resource allocation, and spot scaling issues before they impact users. It enables data-driven decisions about horizontal pod autoscaling configurations, resource limit tuning, and service mesh policy adjustments based on actual performance patterns rather than assumptions.

Q: How does APM for Kubernetes help improve application performance?
A: By tracking requests across pod boundaries and correlating with container resource metrics, APM helps identify services that need different CPU or memory allocations, reveals communication patterns that could benefit from optimization, and guides autoscaling policy tuning. It shows how Kubernetes scheduling decisions affect performance, enabling better resource planning.

Q: How does APM help in optimizing Kubernetes performance?
A: APM data guides resource allocation decisions, identifies services that benefit from horizontal scaling, reveals inefficient inter-service communication patterns, and shows the impact of deployment strategies on performance. It helps right-size containers, optimize JVM garbage collection settings, tune database connection pools, and configure load balancing policies based on real production performance data.