Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

Oct 19th, ‘24 / 11 min read

Prometheus RemoteWrite Exporter: A Comprehensive Guide

A comprehensive guide showing how to use PrometheusRemoteWriteExporter to send metrics from OpenTelemetry to Prometheus compatible backends

PrometheusRemoteWriteExporter: A Comprehensive Guide

Prometheus has become an essential part of modern observability stacks, providing powerful time-series data collection and alerting capabilities. However, as organizations scale their infrastructure, they often encounter limitations with Prometheus' single-instance storage model.

This is where Remote Write functionality helps you, allowing Prometheus to effortlessly send metrics to external storage systems while maintaining its powerful collection and querying capabilities.

What is Prometheus Remote Write?

Prometheus Remote Write is a protocol that enables Prometheus to send metrics data to compatible external storage systems in real-time. This feature addresses several critical needs:

  • Long-term storage: Retain metrics beyond Prometheus' local retention limits
  • High-availability: Create redundant copies of your metrics data
  • Centralization: Collect metrics from multiple Prometheus instances in a single location
  • Specialized storage: Leverage databases optimized for specific query patterns

The observability ecosystem has widely adopted this protocol, with many solutions now offering Prometheus-compatible remote write endpoints, including Prometheus-compatible storage engines, Cortex, Thanos, and various cloud provider offerings.

💡
If you're using Prometheus and need a way to push metrics from short-lived jobs, Pushgateway might be just what you’re looking for.

Key Components of Remote Write Architecture

The Remote Write architecture consists of three primary components:

  1. Prometheus Server: The source of metrics data, responsible for scraping targets and forwarding metrics
  2. Remote Write Protocol: A well-defined HTTP-based protocol using Protocol Buffers for efficient data serialization
  3. Remote Write Endpoint: The destination system that receives, processes, and stores the metrics

This architecture maintains Prometheus' pull-based collection model while adding a push-based capability for storage, creating a flexible and scalable observability pipeline.

Configuring PrometheusRemoteWriteExporter in OpenTelemetry

Remote Write is configured in the Prometheus configuration file (usually prometheus.yml) using YAML. Here's a basic example:

remote_write:
  - url: "https://remote-write-endpoint.example.com/api/v1/write"
    basic_auth:
      username: "prometheus"
      password: "secret"
    write_relabel_configs:
      - source_labels: [__name__]
        regex: "unwanted_metric.*"
        action: drop

This configuration instructs Prometheus to:

  1. Send metrics to the specified URL
  2. Authenticate using basic authentication
  3. Apply relabeling rules to filter metrics before sending

Using External Labels for Source Identification

When aggregating metrics from multiple Prometheus instances, it's crucial to identify the source of each metric. External labels add global metadata to all metrics sent from a specific Prometheus instance:

global:
  external_labels:
    region: "us-west-1"
    environment: "production"
    cluster: "main-cluster"

These labels help distinguish metrics from different Prometheus instances when they're aggregated in a central system.

Write Relabeling for Filtering and Transformation

Write relabeling allows you to modify or filter metrics before they're sent to the remote endpoint:

write_relabel_configs:
  - source_labels: [__name__, job]
    separator: ";"
    regex: "node_.*sockets;node_exporter"
    action: keep

This is powerful for:

  • Reducing data volume by dropping unnecessary metrics
  • Normalizing labels across different sources
  • Adding or modifying metadata before storage
💡
For help setting up or troubleshooting Prometheus ports, this post walks through the key configurations: Prometheus Port Configuration.

Critical Settings for Optimal Performance

Scrape Interval vs. Evaluation Interval: What's the difference?

Two important configuration parameters in Prometheus are often confused but serve distinct purposes:

Scrape Interval: Controlling Data Collection Frequency

The scrape_interval defines how frequently Prometheus collects metrics from monitored targets:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: "node-exporter"
    scrape_interval: 5s  # Overrides global setting for this job

Key points about scrape interval:

  • Affects data resolution and storage requirements
  • Can be set globally and overridden per job
  • Shorter intervals provide more detail but increase resource usage
  • Should align with the dynamics of the metrics you're collecting

Evaluation Interval: Managing Rule Processing

The evaluation_interval determines how frequently Prometheus evaluates recording and alerting rules:

global:
  evaluation_interval: 30s

rule_files:
  - "rules/recording_rules.yml"
  - "rules/alerting_rules.yml"

Key differences from scrape interval:

  • Controls rule processing frequency, not data collection
  • Affects alert responsiveness and resource consumption
  • Typically longer than the scrape interval to reduce computational load
  • Should be tuned based on the urgency of your alerting needs

Balancing Intervals for Optimal Performance

Choosing appropriate intervals requires balancing several factors:

  • Lower intervals increase resolution but consume more resources
  • Scrape interval should be shorter than the shortest-lived phenomena you want to observe
  • Evaluation interval should be shorter than the acceptable delay for alerts
  • Both should be consistent with your retention and query needs

A common pattern is to use shorter scrape intervals for critical infrastructure (5- 10s) and longer intervals for less dynamic systems (30-60s).

💡
If you're working with Prometheus Remote Write, understanding its HTTP API can give you more control over querying and managing your data: Prometheus API.

Remote Write vs. Federation: Choosing the Right Approach

When scaling Prometheus beyond a single instance, you have two primary options: Remote Write and Federation. Understanding the differences is crucial for designing an effective monitoring architecture.

Prometheus Federation: Hierarchical Metric Collection

Federation allows a Prometheus server to scrape selected time series from another Prometheus server:

scrape_configs:
  - job_name: 'federate'
    scrape_interval: 15s
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{job="prometheus"}'
        - '{__name__=~"job:.*"}'
    static_configs:
      - targets:
        - 'source-prometheus:9090'

Federation is useful for:

  • Building hierarchical Prometheus deployments
  • Aggregating metrics from multiple Prometheus instances
  • Creating global views across different environments

Key Differences Between Remote Write and Federation

FeatureRemote WriteFederation
Data FlowPush-basedPull-based
LatencyLow (real-time)Higher (depends on scrape interval)
CompletenessAll metricsSelected metrics only
StorageExternal systemLocal Prometheus storage
Resource ImpactNetwork and CPU on senderNetwork and CPU on receiver
High AvailabilityBuilt for HA setupsRequires additional configuration
ScalabilityHighly scalableLimited by single-instance constraints

Remote Write in Kubernetes Environments

Kubernetes presents specific considerations for Remote Write:

  1. Resource Management: Configure appropriate limits and requests for Prometheus pods to ensure stable operation
  2. Network Policies: Ensure outbound connectivity to remote write endpoints
  3. Authentication: Use Kubernetes secrets for secure credential management
  4. High Cardinality: Be cautious with Kubernetes labels that can cause high cardinality issues
  5. Monitoring the Monitoring: Use metrics like prometheus_remote_storage_* to monitor the health of your remote write setup

When using tools like Prometheus Operator, Remote Write can be configured through custom resources:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  remoteWrite:
    - url: "https://remote-write.example.com/api/v1/write"
      basicAuth:
        username:
          name: remote-write-auth
          key: username
        password:
          name: remote-write-auth
          key: password
💡
To make the most of Prometheus Remote Write, it's helpful to understand which functions can optimize your queries and reduce noise: Prometheus Functions.

How to Integrate with OpenTelemetry

The PrometheusRemoteWriteExporter in OpenTelemetry provides a bridge between OpenTelemetry and Prometheus ecosystems, allowing metrics collected by OpenTelemetry to be sent to any Prometheus-compatible remote write endpoint.

Setting Up the OpenTelemetry Collector

The OpenTelemetry Collector acts as a central hub for telemetry data, processing and forwarding it to various backends:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
    timeout: 10s
    send_batch_size: 1000

exporters:
  prometheusremotewrite:
    endpoint: "https://remote-write-endpoint.example.com/api/v1/write"
    auth:
      authenticator: basicauth/remote
    resource_to_telemetry_conversion:
      enabled: true

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheusremotewrite]

extensions:
  basicauth/remote:
    client_auth:
      username: "${REMOTE_WRITE_USERNAME}"
      password: "${REMOTE_WRITE_PASSWORD}"

Advanced Configuration Options

The PrometheusRemoteWriteExporter supports several advanced configuration options:

Queue Management

Control how metrics are buffered and sent:

exporters:
  prometheusremotewrite:
    endpoint: "https://remote-write-endpoint.example.com/api/v1/write"
    remote_write_queue:
      capacity: 10000  # Maximum number of samples to buffer
      max_samples_per_send: 500  # Maximum number of samples per send

Retry Handling

Configure retry behavior for resilience:

exporters:
  prometheusremotewrite:
    endpoint: "https://remote-write-endpoint.example.com/api/v1/write"
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 5m

TLS Configuration

Secure your connection with TLS:

exporters:
  prometheusremotewrite:
    endpoint: "https://remote-write-endpoint.example.com/api/v1/write"
    tls:
      ca_file: "/path/to/ca.crt"
      cert_file: "/path/to/client.crt"
      key_file: "/path/to/client.key"
      insecure: false  # Set to true only for testing
💡
Once you’ve set up Remote Write, knowing how to write clear, efficient queries can save you a lot of debugging time — Prometheus Query Examples.

Advanced Remote Write Topics: Scaling and Optimization

Remote Write Filtering and Sampling for High-Volume Metrics

For high-volume Prometheus deployments, sending all metrics to remote storage may be impractical. Implement strategic filtering:

remote_write:
  - url: "https://critical-metrics.example.com/api/v1/write"
    write_relabel_configs:
      - source_labels: [__name__]
        regex: 'critical_.*|up|instance:.*'
        action: keep
  
  - url: "https://all-metrics.example.com/api/v1/write"
    queue_config:
      capacity: 20000

For extremely high-volume use cases, consider implementing sampling:

remote_write:
  - url: "https://sampled-metrics.example.com/api/v1/write"
    write_relabel_configs:
      - source_labels: [__name__]
        regex: 'high_volume_metric'
        action: drop
        if: '(randn() % 10) != 0'  # Keep approximately 10% of samples

Multi-Endpoint Remote Write Strategy

Sending to multiple remote endpoints provides redundancy and specialized storage:

remote_write:
  - url: "https://long-term-storage.example.com/api/v1/write"
    write_relabel_configs:
      - source_labels: [__name__]
        regex: 'business_.*|sla_.*'
        action: keep
  
  - url: "https://alert-metrics.example.com/api/v1/write"
    write_relabel_configs:
      - source_labels: [__name__]
        regex: '.*_alerts|up|.*_status'
        action: keep
  
  - url: "https://all-metrics.example.com/api/v1/write"
    queue_config:
      max_samples_per_send: 1000

This approach enables:

  • Different retention policies for different metric types
  • Specialized query engines for specific use cases
  • Cost optimization by routing high-value metrics to premium storage
💡
Now, fix Prometheus Remote Write issues instantly—right from your IDE, with AI and Last9 MCP.

Troubleshooting Common Remote Write Issues and Solutions

Connection and Authentication Issues

Timeout Problems

If you encounter timeouts:

  1. Check network connectivity and firewall rules

Validate endpoint health with a simple HTTP request:

curl -v https://remote-write-endpoint.example.com/api/v1/write

Increase timeout settings:

remote_write:
  - url: "https://remote-write-endpoint.example.com/api/v1/write"
    remote_timeout: 60s  # Default is 30s

Authentication Failures

Common authentication issues:

  1. Check for URL-encoding issues in passwords
  2. Verify that credentials have appropriate permissions

Validate credentials separately using curl:

curl -u "${USERNAME}:${PASSWORD}" https://remote-write-endpoint.example.com/api/v1/write

Data Quality and Performance Issues

Label Value Problems

Prometheus has strict requirements for label values:

  1. Only ASCII characters are allowed
  2. Label values have length limits
  3. Some endpoints may have additional restrictions

Monitor for label validation errors in the Prometheus logs:

level=warn ts=... component=remote msg="Remote storage returned HTTP status 400 Bad Request; error: invalid label value..."

High Cardinality

Watch for exploding cardinality, which can overwhelm remote storage:

  1. Monitor metrics like prometheus_tsdb_head_series
  2. Be cautious with automatically generated labels in Kubernetes environments

Use relabeling to reduce cardinality:

write_relabel_configs:
  - source_labels: [instance]
    target_label: instance
    regex: '(.*):.*'
    replacement: '$1'
💡
If high-cardinality metrics have ever made your monitoring setup messy or expensive, this guide on high cardinality can help you make sense of what to watch out for.

Monitoring Remote Write Performance

Key metrics to monitor:

  • prometheus_remote_storage_samples_pending: Samples waiting to be sent
  • prometheus_remote_storage_failed_samples_total: Samples that couldn't be sent
  • prometheus_remote_storage_sent_batch_duration_seconds: Time to send batches
  • prometheus_remote_storage_succeeded_samples_total: Successfully sent samples
  • prometheus_remote_storage_retried_samples_total: Samples that required retries

Create a dedicated dashboard for these metrics to quickly identify issues.

Best Practices for Production Deployments

Architecture and Planning

  1. Start with Clear Requirements:
    • Define retention periods for different metric types
    • Identify query patterns and performance needs
    • Establish SLAs for monitoring availability
  2. Choose the Right Tools:
    • Select appropriate remote storage based on scale and query needs
    • Consider managed services vs. self-hosted options
    • Evaluate cost implications for different retention periods
  3. Design for High Availability:
    • Implement redundant Prometheus instances
    • Use multiple remote write endpoints for critical metrics
    • Plan for failure scenarios with appropriate retention

Configuration and Tuning

  1. Optimize Resource Usage:
    • Filter unnecessary metrics using write relabeling
    • Use appropriate scrape and evaluation intervals
    • Configure queue settings based on load testing
  2. Security Best Practices:
    • Use TLS for all remote write connections
    • Rotate authentication credentials regularly
    • Apply the principle of least privilege for remote write accounts
  3. Monitoring Your Monitoring:
    • Set up alerts for remote write failures
    • Monitor queue sizes and batch durations
    • Create dashboards for remote write performance metrics
💡
If you're already working with Prometheus and want to get more out of your queries, these PromQL tricks might just save you some time (and a few headaches).

Operational Excellence

  1. Documentation and Knowledge Sharing:
    • Document your remote write architecture
    • Create runbooks for common failure scenarios
    • Share best practices across teams
  2. Regular Audits:
    • Review what metrics are being sent and their value
    • Analyze storage usage and costs
    • Identify opportunities for optimization
  3. Continuous Improvement:
    • Stay updated with Prometheus and remote storage developments
    • Test new features in non-production environments
    • Refine your approach based on operational experience

Conclusion

Remote Write is a foundational capability for scaling Prometheus beyond a single instance, enabling enterprises to build comprehensive and resilient observability platforms.

💡
If you still need to discuss some settings, jump onto the Last9 Discord Server to discuss any specifics you need help with. We have a dedicated channel where you can discuss your specific use case with other developers.

FAQs

What does Prometheus remote write do?

Prometheus remote write allows Prometheus to send metrics data in real-time to external storage systems. It enables long-term storage beyond Prometheus' local retention limits, creates high-availability setups, centralizes metrics from multiple Prometheus instances, and integrates with specialized time-series databases optimized for specific workloads.

This capability is essential for proper metrics ingest at scale, particularly when visualizing Prometheus metrics in tools like Grafana.

What is the remote write spec?

The remote write spec is a protocol definition that enables Prometheus to send metrics to compatible external systems. It uses HTTP as the transport layer and Protocol Buffers for efficient data serialization.

The spec defines how metrics, labels, and timestamps are encoded, compressed, and transmitted to ensure compatibility between Prometheus and various storage backends. While the wire format uses Protocol Buffers, you can inspect the data structure in JSON format for debugging purposes.

What is the difference between scrape_interval and evaluation_interval?

  • scrape_interval: Determines how frequently Prometheus collects metrics from monitored targets. It affects data resolution and storage requirements.
  • evaluation_interval: Controls how frequently Prometheus evaluates recording and alerting rules. It affects alert responsiveness and rule processing load.

While scrape_interval focuses on data collection, evaluation_interval deals with processing that data through rules. The official Prometheus docs on GitHub provide detailed explanations of these settings and their impact on performance.

What is the difference between Prometheus remote write and federation?

Prometheus remote write pushes metrics to external storage in real-time, while federation pulls selected metrics from other Prometheus servers.

Remote write offers lower latency, complete metrics collection, and is built for high-availability setups. Federation is pull-based, can only collect selected metrics, and is more suitable for hierarchical deployments with limited metric needs.

Many organizations use remote write to ingest Prometheus metrics into cloud platforms like AWS Managed Service for Prometheus or Azure Monitor.

How do I configure Prometheus remote write to send metrics to a remote storage system?

Add a remote_write section to your Prometheus configuration:

remote_write:
  - url: "https://remote-storage-system.example.com/api/v1/write"
    basic_auth:
      username: "prometheus"
      password: "secret"

This configuration sends all metrics to the specified endpoint with authentication. For AWS or Azure cloud environments, you'll typically need to configure specific authentication mechanisms as outlined in their respective docs.

How can I configure Prometheus remote write for high availability?

Configure multiple Prometheus instances to send metrics to the same remote storage:

remote_write:
  - url: "https://remote-storage.example.com/api/v1/write"
    queue_config:
      max_shards: 10  # Increase for higher throughput
      capacity: 20000  # Buffer capacity during outages

Also, add unique external labels to identify the source Prometheus:

global:
  external_labels:
    prometheus_replica: "replica1"
    datacenter: "us-east"

This allows you to maintain consistent metrics ingest even during Prometheus instance failures, ensuring uninterrupted Grafana dashboards.

How do I configure Prometheus remote write for long-term storage?

Configure remote write to send metrics to a storage system designed for long-term retention:

remote_write:
  - url: "https://long-term-storage.example.com/api/v1/write"
    write_relabel_configs:
      - source_labels: [__name__]
        regex: 'important_metric.*'
        action: keep  # Only send important metrics for long-term storage

Consider using filtering to reduce storage costs for long-term retention. AWS Timestream and Azure Data Explorer are popular cloud services for this purpose, offering tiered storage options for cost-effective long-term metrics storage.

How can I configure Prometheus remote write to send data to a specific endpoint?

Specify the exact endpoint URL in your configuration:

remote_write:
  - url: "https://specific-endpoint.example.com/api/v1/write"
    authorization:
      type: Bearer
      credentials: "${BEARER_TOKEN}"  # Using bearer_token authentication
    headers:
      X-Tenant-ID: "tenant123"    # Add any required custom headers

Many endpoints support bearer_token authentication as an alternative to basic auth. The GitHub repository for Prometheus contains extensive documentation on all supported authentication methods.

How can I visualize metrics sent via remote write?

Grafana is the most popular tool for visualizing Prometheus metrics stored in remote write destinations. Configure Grafana to connect to your remote write endpoint:

  1. Add a new data source in Grafana
  2. Select the appropriate data source type (Prometheus, AWS Managed Service for Prometheus, Azure Monitor, etc.)
  3. Configure the connection details, including authentication
  4. Create dashboards that query your metrics using PromQL

Grafana provides pre-built dashboards for common Prometheus metrics that you can import and customize.

Can I use remote write with cloud provider observability solutions?

Yes, major cloud providers support Prometheus remote write:

  • AWS: AWS Managed Service for Prometheus offers a fully managed Prometheus-compatible monitoring service with remote write endpoints
  • Azure: Azure Monitor supports Prometheus remote write through its metrics endpoint
  • Google Cloud: Cloud Monitoring (formerly Stackdriver) provides a Prometheus remote write adapter

Each cloud provider's docs contain specific configuration details for their remote write implementations, including authentication and endpoint formats.

If you're experiencing issues with the remote write protocol:

  1. Enable debug logging in Prometheus to see the data being sent
  2. Use the /debug/pprof/heap endpoint to check for memory issues
  3. Check for JSON parsing errors in your remote write endpoint logs

Examine the Protocol Buffer payloads in JSON format for debugging:

curl -s http://prometheus:9090/api/v1/status/runtimeinfo | jq .

Remember that while you can inspect the protocol data in JSON format, the actual wire format uses Protocol Buffers for efficiency.

Contents


Newsletter

Stay updated on the latest from Last9.

Authors
Prathamesh Sonpatki

Prathamesh Sonpatki

Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

X