Argo Rollouts
Monitor Argo Rollouts canary deployments with Last9 — track rollout phase, replica fractions, and automate canary promotion via AnalysisTemplates
Monitor Argo Rollouts canary deployments with Last9 by scraping Prometheus metrics via the OpenTelemetry Collector. Track rollout phase, canary replica fractions, and optionally drive automated canary promotion and rollback using Last9’s Prometheus-compatible read endpoint as an AnalysisTemplate metric provider.
Prerequisites
- Kubernetes cluster with Argo Rollouts installed
kubectlconfigured with cluster access- Last9 account — OTLP endpoint and auth header from the Integrations page
Verify Argo Rollouts is running and the metrics service is available:
kubectl get deploy -n argo-rolloutskubectl get svc argo-rollouts-metrics -n argo-rollouts-
Deploy the OTel Collector
Download
collector-deployment.yamlfrom opentelemetry-examples. Edit the placeholders:Placeholder Value YOUR_LAST9_OTLP_ENDPOINTFrom Last9 Integrations page YOUR_LAST9_AUTH_HEADERFrom Last9 Integrations page YOUR_CLUSTER_NAMEYour cluster identifier Apply it:
kubectl apply -f collector-deployment.yamlThis creates the
monitoringnamespace, a ConfigMap with the OTel config, the Collector Deployment, and RBAC resources for Prometheus scraping. -
Verify the metrics endpoint
kubectl port-forward svc/argo-rollouts-metrics -n argo-rollouts 8090:8090 &curl -s http://localhost:8090/metrics | grep -E "^rollout_info|^rollout_phase" -
Verify the collector is exporting
kubectl get pods -n monitoring -l app=otel-collectorkubectl logs -n monitoring -l app=otel-collector --tail=50 -
Confirm data in Last9
Open Metrics Explorer and search for
rollout_info. You should see metrics with labels likename,namespace, andphase.
Metrics Reference
| Metric | Description | Key Labels |
|---|---|---|
rollout_info | Rollout presence and current phase | name, namespace, phase, strategy |
rollout_phase | Phase gauge — one series per phase, value 0 or 1 | name, namespace, phase, strategy |
rollout_info_replicas_available | Available replica count | name, namespace |
rollout_info_replicas_updated | Updated (canary) replica count | name, namespace |
rollout_reconcile | Reconcile duration histogram | name, namespace |
rollout_reconcile_error | Reconcile error count | name, namespace |
rollout_events_total | Rollout lifecycle events | name, namespace, reason, type |
analysis_run_info | Analysis run status | name, namespace, phase |
experiment_info | Experiment status | name, namespace |
argo_rollouts_controller_workqueue_depth | Controller queue backlog | name |
Dashboard Panels
| Panel | Query |
|---|---|
| Canary replica fraction | rollout_info_replicas_updated / rollout_info_replicas_desired |
| Rollout phase status | rollout_phase grouped by phase, name |
| Reconcile error rate | rate(rollout_reconcile_error[5m]) |
| Analysis run results | analysis_run_info grouped by phase, name |
Automated Canary Gating
Last9’s Prometheus-compatible read endpoint can drive automated canary promotion and rollback via Argo Rollouts AnalysisTemplates. At each rollout step pause, Argo Rollouts queries Last9 and promotes or rolls back based on your metric thresholds.
-
Create the auth Secret
The Argo Rollouts Prometheus provider does not support
basic_authnatively — pass credentials as a pre-encodedAuthorizationheader:kubectl create secret generic last9-prometheus-auth \--namespace <your-app-namespace> \--from-literal=authorization="Basic $(printf '<username>:<password>' | base64 | tr -d '\n')"Get your username and password from Last9 Integrations (Prometheus section).
-
Apply the AnalysisTemplate
apiVersion: argoproj.io/v1alpha1kind: AnalysisTemplatemetadata:name: last9-http-error-ratespec:args:- name: service-name- name: namespace- name: last9-authvalueFrom:secretKeyRef:name: last9-prometheus-authkey: authorizationmetrics:- name: http-error-rateinterval: 2msuccessCondition: result[0] < 0.05 # promote if error rate < 5%failureCondition: result[0] >= 0.10 # rollback if error rate >= 10%failureLimit: 3provider:prometheus:address: <your-last9-prometheus-read-endpoint>headers:- key: Authorizationvalue: "{{args.last9-auth}}"query: |sum(rate(http_requests_total{namespace="{{args.namespace}}",service="{{args.service-name}}",status=~"5.."}[5m]))/sum(rate(http_requests_total{namespace="{{args.namespace}}",service="{{args.service-name}}"}[5m])) -
Reference in your Rollout
strategy:canary:analysis:templates:- templateName: last9-http-error-rateargs:- name: service-namevalue: my-app- name: namespacevalueFrom:fieldRef:fieldPath: metadata.namespacesteps:- setWeight: 10- pause: { duration: 5m } # analysis runs here- setWeight: 25- pause: { duration: 5m }- setWeight: 50- pause: { duration: 5m }
Watch analysis runs:
kubectl argo rollouts get rollout my-app --watchkubectl argo rollouts list analysisrunsA p99 latency template and full reference Rollout spec are available in opentelemetry-examples.
Troubleshooting
-
No metrics in Last9
- Verify the collector can reach
argo-rollouts-metrics.argo-rollouts.svc.cluster.local:8090 - Check collector logs:
kubectl logs -n monitoring -l app=otel-collector | grep -i error
- Verify the collector can reach
-
AnalysisRun
invalid header field valueRecreate the secret with
printf+tr -d '\n'instead ofecho. -
AnalysisRun phase Error
kubectl describe analysisrun <name>The
messagefield shows the exact Prometheus query error or auth failure.
Please get in touch with us on Discord or Email if you have any questions.