Argo Rollouts

Monitor Argo Rollouts canary deployments with Last9 by scraping Prometheus metrics via the OpenTelemetry Collector. Track rollout phase, canary replica fractions, and optionally drive automated canary promotion and rollback using Last9’s Prometheus-compatible read endpoint as an AnalysisTemplate metric provider.

Prerequisites

Kubernetes cluster with Argo Rollouts installed
kubectl configured with cluster access
Last9 account — OTLP endpoint and auth header from the Integrations page

Verify Argo Rollouts is running and the metrics service is available:

kubectl get deploy -n argo-rollouts
kubectl get svc argo-rollouts-metrics -n argo-rollouts

Deploy the OTel Collector

Download collector-deployment.yaml from opentelemetry-examples. Edit the placeholders:

Placeholder Value
YOUR_LAST9_OTLP_ENDPOINT From Last9 Integrations page
YOUR_LAST9_AUTH_HEADER From Last9 Integrations page
YOUR_CLUSTER_NAME Your cluster identifier

Apply it:
```
kubectl apply -f collector-deployment.yaml
```
This creates the monitoring namespace, a ConfigMap with the OTel config, the Collector Deployment, and RBAC resources for Prometheus scraping.

Placeholder	Value
`YOUR_LAST9_OTLP_ENDPOINT`	From Last9 Integrations page
`YOUR_LAST9_AUTH_HEADER`	From Last9 Integrations page
`YOUR_CLUSTER_NAME`	Your cluster identifier

Verify the metrics endpoint

kubectl port-forward svc/argo-rollouts-metrics -n argo-rollouts 8090:8090 &
curl -s http://localhost:8090/metrics | grep -E "^rollout_info|^rollout_phase"

Verify the collector is exporting

kubectl get pods -n monitoring -l app=otel-collector
kubectl logs -n monitoring -l app=otel-collector --tail=50

Confirm data in Last9

Open Metrics Explorer and search for rollout_info. You should see metrics with labels like name, namespace, and phase.

Metrics Reference

Metric	Description	Key Labels
`rollout_info`	Rollout presence and current phase	`name`, `namespace`, `phase`, `strategy`
`rollout_phase`	Phase gauge — one series per phase, value 0 or 1	`name`, `namespace`, `phase`, `strategy`
`rollout_info_replicas_available`	Available replica count	`name`, `namespace`
`rollout_info_replicas_updated`	Updated (canary) replica count	`name`, `namespace`
`rollout_reconcile`	Reconcile duration histogram	`name`, `namespace`
`rollout_reconcile_error`	Reconcile error count	`name`, `namespace`
`rollout_events_total`	Rollout lifecycle events	`name`, `namespace`, `reason`, `type`
`analysis_run_info`	Analysis run status	`name`, `namespace`, `phase`
`experiment_info`	Experiment status	`name`, `namespace`
`argo_rollouts_controller_workqueue_depth`	Controller queue backlog	`name`

Dashboard Panels

Panel	Query
Canary replica fraction	`rollout_info_replicas_updated / rollout_info_replicas_desired`
Rollout phase status	`rollout_phase` grouped by `phase`, `name`
Reconcile error rate	`rate(rollout_reconcile_error[5m])`
Analysis run results	`analysis_run_info` grouped by `phase`, `name`

Automated Canary Gating

Last9’s Prometheus-compatible read endpoint can drive automated canary promotion and rollback via Argo Rollouts AnalysisTemplates. At each rollout step pause, Argo Rollouts queries Last9 and promotes or rolls back based on your metric thresholds.

Create the auth Secret

The Argo Rollouts Prometheus provider does not support basic_auth natively — pass credentials as a pre-encoded Authorization header:
```
kubectl create secret generic last9-prometheus-auth \
  --namespace <your-app-namespace> \
  --from-literal=authorization="Basic $(printf '<username>:<password>' | base64 | tr -d '\n')"
```
Use printf and tr -d '\n' — echo appends a newline to base64 output, causing invalid header field value errors at runtime.

Get your username and password from Last9 Integrations (Prometheus section).

Apply the AnalysisTemplate

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: last9-http-error-rate
spec:
  args:
    - name: service-name
    - name: namespace
    - name: last9-auth
      valueFrom:
        secretKeyRef:
          name: last9-prometheus-auth
          key: authorization
  metrics:
    - name: http-error-rate
      interval: 2m
      successCondition: result[0] < 0.05 # promote if error rate < 5%
      failureCondition: result[0] >= 0.10 # rollback if error rate >= 10%
      failureLimit: 3
      provider:
        prometheus:
          address: <your-last9-prometheus-read-endpoint>
          headers:
            - key: Authorization
              value: "{{args.last9-auth}}"
          query: |
            sum(rate(http_requests_total{
              namespace="{{args.namespace}}",
              service="{{args.service-name}}",
              status=~"5.."
            }[5m]))
            /
            sum(rate(http_requests_total{
              namespace="{{args.namespace}}",
              service="{{args.service-name}}"
            }[5m]))

Reference in your Rollout

strategy:
  canary:
    analysis:
      templates:
        - templateName: last9-http-error-rate
      args:
        - name: service-name
          value: my-app
        - name: namespace
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
    steps:
      - setWeight: 10
      - pause: { duration: 5m } # analysis runs here
      - setWeight: 25
      - pause: { duration: 5m }
      - setWeight: 50
      - pause: { duration: 5m }

Watch analysis runs:

kubectl argo rollouts get rollout my-app --watch
kubectl argo rollouts list analysisruns

A p99 latency template and full reference Rollout spec are available in opentelemetry-examples.

Troubleshooting

No metrics in Last9
- Verify the collector can reach argo-rollouts-metrics.argo-rollouts.svc.cluster.local:8090
- Check collector logs: kubectl logs -n monitoring -l app=otel-collector | grep -i error
AnalysisRun invalid header field value

Recreate the secret with printf + tr -d '\n' instead of echo.
AnalysisRun phase Error
```
kubectl describe analysisrun <name>
```
The message field shows the exact Prometheus query error or auth failure.

Please get in touch with us on Discord or Email if you have any questions.