Skip to content
Last9
Book demo

Argo Rollouts

Monitor Argo Rollouts canary deployments with Last9 — track rollout phase, replica fractions, and automate canary promotion via AnalysisTemplates

Monitor Argo Rollouts canary deployments with Last9 by scraping Prometheus metrics via the OpenTelemetry Collector. Track rollout phase, canary replica fractions, and optionally drive automated canary promotion and rollback using Last9’s Prometheus-compatible read endpoint as an AnalysisTemplate metric provider.

Prerequisites

Verify Argo Rollouts is running and the metrics service is available:

kubectl get deploy -n argo-rollouts
kubectl get svc argo-rollouts-metrics -n argo-rollouts
  1. Deploy the OTel Collector

    Download collector-deployment.yaml from opentelemetry-examples. Edit the placeholders:

    PlaceholderValue
    YOUR_LAST9_OTLP_ENDPOINTFrom Last9 Integrations page
    YOUR_LAST9_AUTH_HEADERFrom Last9 Integrations page
    YOUR_CLUSTER_NAMEYour cluster identifier

    Apply it:

    kubectl apply -f collector-deployment.yaml

    This creates the monitoring namespace, a ConfigMap with the OTel config, the Collector Deployment, and RBAC resources for Prometheus scraping.

  2. Verify the metrics endpoint

    kubectl port-forward svc/argo-rollouts-metrics -n argo-rollouts 8090:8090 &
    curl -s http://localhost:8090/metrics | grep -E "^rollout_info|^rollout_phase"
  3. Verify the collector is exporting

    kubectl get pods -n monitoring -l app=otel-collector
    kubectl logs -n monitoring -l app=otel-collector --tail=50
  4. Confirm data in Last9

    Open Metrics Explorer and search for rollout_info. You should see metrics with labels like name, namespace, and phase.

Metrics Reference

MetricDescriptionKey Labels
rollout_infoRollout presence and current phasename, namespace, phase, strategy
rollout_phasePhase gauge — one series per phase, value 0 or 1name, namespace, phase, strategy
rollout_info_replicas_availableAvailable replica countname, namespace
rollout_info_replicas_updatedUpdated (canary) replica countname, namespace
rollout_reconcileReconcile duration histogramname, namespace
rollout_reconcile_errorReconcile error countname, namespace
rollout_events_totalRollout lifecycle eventsname, namespace, reason, type
analysis_run_infoAnalysis run statusname, namespace, phase
experiment_infoExperiment statusname, namespace
argo_rollouts_controller_workqueue_depthController queue backlogname

Dashboard Panels

PanelQuery
Canary replica fractionrollout_info_replicas_updated / rollout_info_replicas_desired
Rollout phase statusrollout_phase grouped by phase, name
Reconcile error raterate(rollout_reconcile_error[5m])
Analysis run resultsanalysis_run_info grouped by phase, name

Automated Canary Gating

Last9’s Prometheus-compatible read endpoint can drive automated canary promotion and rollback via Argo Rollouts AnalysisTemplates. At each rollout step pause, Argo Rollouts queries Last9 and promotes or rolls back based on your metric thresholds.

  1. Create the auth Secret

    The Argo Rollouts Prometheus provider does not support basic_auth natively — pass credentials as a pre-encoded Authorization header:

    kubectl create secret generic last9-prometheus-auth \
    --namespace <your-app-namespace> \
    --from-literal=authorization="Basic $(printf '<username>:<password>' | base64 | tr -d '\n')"

    Get your username and password from Last9 Integrations (Prometheus section).

  2. Apply the AnalysisTemplate

    apiVersion: argoproj.io/v1alpha1
    kind: AnalysisTemplate
    metadata:
    name: last9-http-error-rate
    spec:
    args:
    - name: service-name
    - name: namespace
    - name: last9-auth
    valueFrom:
    secretKeyRef:
    name: last9-prometheus-auth
    key: authorization
    metrics:
    - name: http-error-rate
    interval: 2m
    successCondition: result[0] < 0.05 # promote if error rate < 5%
    failureCondition: result[0] >= 0.10 # rollback if error rate >= 10%
    failureLimit: 3
    provider:
    prometheus:
    address: <your-last9-prometheus-read-endpoint>
    headers:
    - key: Authorization
    value: "{{args.last9-auth}}"
    query: |
    sum(rate(http_requests_total{
    namespace="{{args.namespace}}",
    service="{{args.service-name}}",
    status=~"5.."
    }[5m]))
    /
    sum(rate(http_requests_total{
    namespace="{{args.namespace}}",
    service="{{args.service-name}}"
    }[5m]))
  3. Reference in your Rollout

    strategy:
    canary:
    analysis:
    templates:
    - templateName: last9-http-error-rate
    args:
    - name: service-name
    value: my-app
    - name: namespace
    valueFrom:
    fieldRef:
    fieldPath: metadata.namespace
    steps:
    - setWeight: 10
    - pause: { duration: 5m } # analysis runs here
    - setWeight: 25
    - pause: { duration: 5m }
    - setWeight: 50
    - pause: { duration: 5m }

Watch analysis runs:

kubectl argo rollouts get rollout my-app --watch
kubectl argo rollouts list analysisruns

A p99 latency template and full reference Rollout spec are available in opentelemetry-examples.


Troubleshooting

  • No metrics in Last9

    • Verify the collector can reach argo-rollouts-metrics.argo-rollouts.svc.cluster.local:8090
    • Check collector logs: kubectl logs -n monitoring -l app=otel-collector | grep -i error
  • AnalysisRun invalid header field value

    Recreate the secret with printf + tr -d '\n' instead of echo.

  • AnalysisRun phase Error

    kubectl describe analysisrun <name>

    The message field shows the exact Prometheus query error or auth failure.

Please get in touch with us on Discord or Email if you have any questions.