Cluster Monitoring

Monitor your Kubernetes cluster with Last9 using the Prometheus Kubernetes monitoring stack. This integration provides comprehensive cluster-level metrics, node metrics, and application insights with remote write capabilities to Last9.

Prerequisites

Before setting up Kubernetes cluster monitoring, ensure you have:

Kubernetes Cluster: A running Kubernetes cluster (v1.19+)
kubectl: Configured and connected to your cluster
Helm: Installed (v3.9 or higher)
Cluster Admin Access: Required for creating cluster-wide resources
Last9 Account: With Prometheus remote write credentials

Create Monitoring Namespace

Create a dedicated namespace for Last9 monitoring components:
```
kubectl create namespace last9
```

Add Prometheus Community Helm Repository

Add and update the Prometheus community Helm repository:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Set Up Remote Write Credentials

Create a Kubernetes secret containing your Last9 Prometheus remote write credentials:
```
kubectl create secret generic last9-remote-write-secret \
  -n last9 \
  --from-literal=username="{{ .Metrics.Username }}" \
  --from-literal=password="{{ .Metrics.WriteToken }}"
```
Replace the placeholder values with your actual Last9 credentials from the Last9 Integrations page.

Create Monitoring Configuration

Create a file named k8s-monitoring-values.yaml with the following Helm chart values configuration:

# Disable default deployments
alertmanager:
  enabled: false

grafana:
  enabled: false

prometheus:
  enabled: true
  agentMode: true
  prometheusSpec:
    # Enable only necessary scrape configs
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false

    # Configure remote write
    # Global external labels to add to all metrics
    externalLabels:
      cluster: my-cluster-name # Replace with your cluster name

    remoteWrite:
      - url: "{{ .Metrics.WriteURL }}"
        remoteTimeout: 60s
        queueConfig:
          capacity: 10000
          maxSamplesPerSend: 3000
          batchSendDeadline: 20s
          minShards: 4
          maxShards: 200
          minBackoff: 100ms
          maxBackoff: 10s
        basicAuth:
          username:
            name: last9-remote-write-secret
            key: username
          password:
            name: last9-remote-write-secret
            key: password
        writeRelabelConfigs:
          - sourceLabels: [__name__]
            regex: "up|kube_.*|container_.*|node_.*" # Keep relevant metrics
            action: keep

# Keep kube-state-metrics
kubeStateMetrics:
  enabled: true

# Keep node-exporter
nodeExporter:
  enabled: true

# Enable cadvisor via kubelet service monitor
kubelet:
  enabled: true
  serviceMonitor:
    resource: true # Enables scraping of cadvisor metrics
    cAdvisor: true

# Disable unnecessary components
prometheusOperator:
  admissionWebhooks:
    enabled: false
  tls:
    enabled: false

# Disable other exporters
kubeApiServer:
  enabled: true

kubeControllerManager:
  enabled: false

kubeDns:
  enabled: false

kubeEtcd:
  enabled: false

kubeProxy:
  enabled: false

kubeScheduler:
  enabled: false

Configuration Explanation:

Agent Mode: Prometheus runs in agent mode, optimized for remote write without local storage
Metric Filtering: Only collects essential Kubernetes metrics to reduce data volume
External Labels: Adds cluster identification to all metrics
Remote Write: Configured with optimal queue settings for reliable data transmission

Install Monitoring Stack

Deploy the Kubernetes monitoring stack using Helm:
```
helm upgrade --install last9-k8s-monitoring prometheus-community/kube-prometheus-stack \
  -n last9 \
  -f k8s-monitoring-values.yaml \
  --version 75.15.1 \
  --create-namespace
```
This command installs:
- Prometheus Operator: Manages Prometheus instances and configurations
- Prometheus: In agent mode for metric collection and remote write
- kube-state-metrics: Exposes Kubernetes object state as metrics
- node-exporter: Collects hardware and OS metrics from cluster nodes

Verify Installation

Check that all monitoring components are running correctly:

kubectl get pods -n last9

You should see pods similar to:

NAME                                                   READY   STATUS    RESTARTS   AGE
last9-k8s-monitoring-kube-state-metrics-xxx-xxx       1/1     Running   0          2m
last9-k8s-monitoring-operator-xxx-xxx                 1/1     Running   0          2m
last9-k8s-monitoring-prometheus-node-exporter-xxx     1/1     Running   0          2m
prometheus-last9-k8s-monitoring-prometheus-0          2/2     Running   0          2m

Understanding the Setup

Prometheus Agent Mode

The setup uses Prometheus in agent mode, which:

Optimized for Remote Write: No local storage, designed specifically for forwarding metrics
Reduced Resource Usage: Lower memory and storage requirements
Reliable Data Transfer: Built-in queue management and retry logic
Automatic Discovery: Discovers services and pods automatically via service monitors

Metrics Collected

The monitoring stack automatically collects:

Cluster-Level Metrics

kube-state-metrics: Kubernetes object states (deployments, pods, services, etc.)
API Server Metrics: Kubernetes API server performance and availability
Cluster Resource Usage: CPU, memory, and storage across the cluster

Node-Level Metrics

node-exporter: Hardware and OS metrics from each node
kubelet: Container runtime metrics via cAdvisor
Node Resources: CPU, memory, disk, and network utilization

Container Metrics

Container Resources: CPU and memory usage per container
Pod Metrics: Lifecycle, restart counts, and resource requests/limits
Network Metrics: Network I/O per pod and container

Verification and Monitoring

Check Prometheus Remote Write

Verify that Prometheus is successfully sending data to Last9:

kubectl logs -n last9 prometheus-last9-k8s-monitoring-prometheus-0 -c prometheus | grep "remote_write"

Validate Secret Access

Ensure Prometheus can access the remote write credentials:
```
kubectl get secret last9-remote-write-secret -n last9 -o yaml
```
Monitor Resource Usage

Check resource consumption of monitoring components:
```
kubectl top pods -n last9
```
Verify Metrics in Last9

Log into your Last9 account and check that Kubernetes metrics are being received in Grafana.

Look for metrics like:
- up{job="kube-state-metrics"}
- kube_pod_info
- node_cpu_seconds_total
- container_memory_usage_bytes

Configuration Customization

Cluster Identification

Update the external labels to identify your cluster:

prometheus:
  prometheusSpec:
    externalLabels:
      cluster: production-us-east-1
      environment: production
      team: platform

Metric Filtering

Customize the write relabel configs to include/exclude specific metrics:

writeRelabelConfigs:
  - sourceLabels: [__name__]
    regex: "up|kube_.*|container_.*|node_.*|prometheus_.*"
    action: keep
  - sourceLabels: [__name__]
    regex: "kube_pod_container_status_.*"
    action: drop # Remove noisy metrics

Resource Limits

Configure resource limits for monitoring components:

prometheus:
  prometheusSpec:
    resources:
      limits:
        cpu: 2000m
        memory: 4Gi
      requests:
        cpu: 1000m
        memory: 2Gi

nodeExporter:
  resources:
    limits:
      cpu: 200m
      memory: 200Mi
    requests:
      cpu: 100m
      memory: 100Mi

Uninstallation

To remove the monitoring stack:

helm uninstall last9-k8s-monitoring -n last9
kubectl delete namespace last9

Troubleshooting

Prometheus Not Starting

Check Prometheus logs for configuration issues:

kubectl logs -n last9 prometheus-last9-k8s-monitoring-prometheus-0 -c prometheus

Remote Write Failures

Verify credentials and network connectivity:

kubectl describe secret last9-remote-write-secret -n last9
kubectl logs -n last9 prometheus-last9-k8s-monitoring-prometheus-0 -c prometheus | grep -i error

High Resource Usage

Monitor resource consumption and adjust limits:

kubectl top pods -n last9
kubectl describe pod -n last9 prometheus-last9-k8s-monitoring-prometheus-0

Missing Metrics

Check service monitor selection and pod discovery:

kubectl get servicemonitors -n last9
kubectl get podmonitors -n last9

Best Practices

Cluster Naming: Use consistent cluster naming across environments
Resource Limits: Set appropriate CPU and memory limits for your cluster size
Metric Filtering: Filter metrics to reduce costs and improve query performance
Monitoring: Set up alerts for monitoring stack health and remote write failures
Updates: Regularly update the Helm chart to get latest features and security fixes

Need Help?

If you encounter any issues or have questions:

Join our Discord community for real-time support
Contact our support team at support@last9.io