Prometheus RemoteWrite Exporter: A Comprehensive Guide

Prometheus has become an essential part of modern observability stacks, providing powerful time-series data collection and alerting capabilities. However, as organizations scale their infrastructure, they often encounter limitations with Prometheus' single-instance storage model.

This is where Remote Write functionality helps you, allowing Prometheus to effortlessly send metrics to external storage systems while maintaining its powerful collection and querying capabilities.

What is Prometheus Remote Write?

Prometheus Remote Write is a protocol that enables Prometheus to send metrics data to compatible external storage systems in real-time. This feature addresses several critical needs:

Long-term storage: Retain metrics beyond Prometheus' local retention limits
High-availability: Create redundant copies of your metrics data
Centralization: Collect metrics from multiple Prometheus instances in a single location
Specialized storage: Leverage databases optimized for specific query patterns

The observability ecosystem has widely adopted this protocol, with many solutions now offering Prometheus-compatible remote write endpoints, including Prometheus-compatible storage engines, Cortex, Thanos, and various cloud provider offerings.

💡

If you're using Prometheus and need a way to push metrics from short-lived jobs, Pushgateway might be just what you’re looking for.

Key Components of Remote Write Architecture

The Remote Write architecture consists of three primary components:

Prometheus Server: The source of metrics data, responsible for scraping targets and forwarding metrics
Remote Write Protocol: A well-defined HTTP-based protocol using Protocol Buffers for efficient data serialization
Remote Write Endpoint: The destination system that receives, processes, and stores the metrics

This architecture maintains Prometheus' pull-based collection model while adding a push-based capability for storage, creating a flexible and scalable observability pipeline.

Configuring PrometheusRemoteWriteExporter in OpenTelemetry

Remote Write is configured in the Prometheus configuration file (usually prometheus.yml) using YAML. Here's a basic example:

remote_write:
  - url: "https://remote-write-endpoint.example.com/api/v1/write"
    basic_auth:
      username: "prometheus"
      password: "secret"
    write_relabel_configs:
      - source_labels: [__name__]
        regex: "unwanted_metric.*"
        action: drop

This configuration instructs Prometheus to:

Send metrics to the specified URL
Authenticate using basic authentication
Apply relabeling rules to filter metrics before sending

Using External Labels for Source Identification

When aggregating metrics from multiple Prometheus instances, it's crucial to identify the source of each metric. External labels add global metadata to all metrics sent from a specific Prometheus instance:

global:
  external_labels:
    region: "us-west-1"
    environment: "production"
    cluster: "main-cluster"

These labels help distinguish metrics from different Prometheus instances when they're aggregated in a central system.

Write Relabeling for Filtering and Transformation

Write relabeling allows you to modify or filter metrics before they're sent to the remote endpoint:

write_relabel_configs:
  - source_labels: [__name__, job]
    separator: ";"
    regex: "node_.*sockets;node_exporter"
    action: keep

This is powerful for:

Reducing data volume by dropping unnecessary metrics
Normalizing labels across different sources
Adding or modifying metadata before storage

💡

For help setting up or troubleshooting Prometheus ports, this post walks through the key configurations: Prometheus Port Configuration.

Critical Settings for Optimal Performance

Scrape Interval vs. Evaluation Interval: What's the difference?

Two important configuration parameters in Prometheus are often confused but serve distinct purposes:

Scrape Interval: Controlling Data Collection Frequency

The scrape_interval defines how frequently Prometheus collects metrics from monitored targets:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: "node-exporter"
    scrape_interval: 5s  # Overrides global setting for this job

Key points about scrape interval:

Affects data resolution and storage requirements
Can be set globally and overridden per job
Shorter intervals provide more detail but increase resource usage
Should align with the dynamics of the metrics you're collecting

Evaluation Interval: Managing Rule Processing

The evaluation_interval determines how frequently Prometheus evaluates recording and alerting rules:

global:
  evaluation_interval: 30s

rule_files:
  - "rules/recording_rules.yml"
  - "rules/alerting_rules.yml"

Key differences from scrape interval:

Controls rule processing frequency, not data collection
Affects alert responsiveness and resource consumption
Typically longer than the scrape interval to reduce computational load
Should be tuned based on the urgency of your alerting needs

Balancing Intervals for Optimal Performance

Choosing appropriate intervals requires balancing several factors:

Lower intervals increase resolution but consume more resources
Scrape interval should be shorter than the shortest-lived phenomena you want to observe
Evaluation interval should be shorter than the acceptable delay for alerts
Both should be consistent with your retention and query needs

A common pattern is to use shorter scrape intervals for critical infrastructure (5- 10s) and longer intervals for less dynamic systems (30-60s).

💡

If you're working with Prometheus Remote Write, understanding its HTTP API can give you more control over querying and managing your data: Prometheus API.

Remote Write vs. Federation: Choosing the Right Approach

When scaling Prometheus beyond a single instance, you have two primary options: Remote Write and Federation. Understanding the differences is crucial for designing an effective monitoring architecture.

Prometheus Federation: Hierarchical Metric Collection

Federation allows a Prometheus server to scrape selected time series from another Prometheus server:

scrape_configs:
  - job_name: 'federate'
    scrape_interval: 15s
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{job="prometheus"}'
        - '{__name__=~"job:.*"}'
    static_configs:
      - targets:
        - 'source-prometheus:9090'

Federation is useful for:

Building hierarchical Prometheus deployments
Aggregating metrics from multiple Prometheus instances
Creating global views across different environments

Key Differences Between Remote Write and Federation

Feature	Remote Write	Federation
Data Flow	Push-based	Pull-based
Latency	Low (real-time)	Higher (depends on scrape interval)
Completeness	All metrics	Selected metrics only
Storage	External system	Local Prometheus storage
Resource Impact	Network and CPU on sender	Network and CPU on receiver
High Availability	Built for HA setups	Requires additional configuration
Scalability	Highly scalable	Limited by single-instance constraints

Remote Write in Kubernetes Environments

Kubernetes presents specific considerations for Remote Write:

Resource Management: Configure appropriate limits and requests for Prometheus pods to ensure stable operation
Network Policies: Ensure outbound connectivity to remote write endpoints
Authentication: Use Kubernetes secrets for secure credential management
High Cardinality: Be cautious with Kubernetes labels that can cause high cardinality issues
Monitoring the Monitoring: Use metrics like prometheus_remote_storage_* to monitor the health of your remote write setup

When using tools like Prometheus Operator, Remote Write can be configured through custom resources:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  remoteWrite:
    - url: "https://remote-write.example.com/api/v1/write"
      basicAuth:
        username:
          name: remote-write-auth
          key: username
        password:
          name: remote-write-auth
          key: password

💡

To make the most of Prometheus Remote Write, it's helpful to understand which functions can optimize your queries and reduce noise: Prometheus Functions.

How to Integrate with OpenTelemetry

The PrometheusRemoteWriteExporter in OpenTelemetry provides a bridge between OpenTelemetry and Prometheus ecosystems, allowing metrics collected by OpenTelemetry to be sent to any Prometheus-compatible remote write endpoint.

Setting Up the OpenTelemetry Collector

The OpenTelemetry Collector acts as a central hub for telemetry data, processing and forwarding it to various backends:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
    timeout: 10s
    send_batch_size: 1000

exporters:
  prometheusremotewrite:
    endpoint: "https://remote-write-endpoint.example.com/api/v1/write"
    auth:
      authenticator: basicauth/remote
    resource_to_telemetry_conversion:
      enabled: true

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheusremotewrite]

extensions:
  basicauth/remote:
    client_auth:
      username: "${REMOTE_WRITE_USERNAME}"
      password: "${REMOTE_WRITE_PASSWORD}"

Advanced Configuration Options

The PrometheusRemoteWriteExporter supports several advanced configuration options:

Queue Management

Control how metrics are buffered and sent:

exporters:
  prometheusremotewrite:
    endpoint: "https://remote-write-endpoint.example.com/api/v1/write"
    remote_write_queue:
      capacity: 10000  # Maximum number of samples to buffer
      max_samples_per_send: 500  # Maximum number of samples per send

Retry Handling

Configure retry behavior for resilience:

exporters:
  prometheusremotewrite:
    endpoint: "https://remote-write-endpoint.example.com/api/v1/write"
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 5m

TLS Configuration

Secure your connection with TLS:

exporters:
  prometheusremotewrite:
    endpoint: "https://remote-write-endpoint.example.com/api/v1/write"
    tls:
      ca_file: "/path/to/ca.crt"
      cert_file: "/path/to/client.crt"
      key_file: "/path/to/client.key"
      insecure: false  # Set to true only for testing

💡

Once you’ve set up Remote Write, knowing how to write clear, efficient queries can save you a lot of debugging time — Prometheus Query Examples.

Advanced Remote Write Topics: Scaling and Optimization

Remote Write Filtering and Sampling for High-Volume Metrics

For high-volume Prometheus deployments, sending all metrics to remote storage may be impractical. Implement strategic filtering:

remote_write:
  - url: "https://critical-metrics.example.com/api/v1/write"
    write_relabel_configs:
      - source_labels: [__name__]
        regex: 'critical_.*|up|instance:.*'
        action: keep
  
  - url: "https://all-metrics.example.com/api/v1/write"
    queue_config:
      capacity: 20000

For extremely high-volume use cases, consider implementing sampling:

remote_write:
  - url: "https://sampled-metrics.example.com/api/v1/write"
    write_relabel_configs:
      - source_labels: [__name__]
        regex: 'high_volume_metric'
        action: drop
        if: '(randn() % 10) != 0'  # Keep approximately 10% of samples

Multi-Endpoint Remote Write Strategy

Sending to multiple remote endpoints provides redundancy and specialized storage:

remote_write:
  - url: "https://long-term-storage.example.com/api/v1/write"
    write_relabel_configs:
      - source_labels: [__name__]
        regex: 'business_.*|sla_.*'
        action: keep
  
  - url: "https://alert-metrics.example.com/api/v1/write"
    write_relabel_configs:
      - source_labels: [__name__]
        regex: '.*_alerts|up|.*_status'
        action: keep
  
  - url: "https://all-metrics.example.com/api/v1/write"
    queue_config:
      max_samples_per_send: 1000

This approach enables:

Different retention policies for different metric types
Specialized query engines for specific use cases
Cost optimization by routing high-value metrics to premium storage

💡

Now, fix Prometheus Remote Write issues instantly—right from your IDE, with AI and Last9 MCP.

Troubleshooting Common Remote Write Issues and Solutions

Connection and Authentication Issues

Timeout Problems

If you encounter timeouts:

Check network connectivity and firewall rules

Validate endpoint health with a simple HTTP request:

curl -v https://remote-write-endpoint.example.com/api/v1/write

Increase timeout settings:

remote_write:
  - url: "https://remote-write-endpoint.example.com/api/v1/write"
    remote_timeout: 60s  # Default is 30s

Authentication Failures

Common authentication issues:

Check for URL-encoding issues in passwords
Verify that credentials have appropriate permissions

Validate credentials separately using curl:

curl -u "${USERNAME}:${PASSWORD}" https://remote-write-endpoint.example.com/api/v1/write

Data Quality and Performance Issues

Label Value Problems

Prometheus has strict requirements for label values:

Only ASCII characters are allowed
Label values have length limits
Some endpoints may have additional restrictions

Monitor for label validation errors in the Prometheus logs:

level=warn ts=... component=remote msg="Remote storage returned HTTP status 400 Bad Request; error: invalid label value..."

High Cardinality

Watch for exploding cardinality, which can overwhelm remote storage:

Monitor metrics like prometheus_tsdb_head_series
Be cautious with automatically generated labels in Kubernetes environments

Use relabeling to reduce cardinality:

write_relabel_configs:
  - source_labels: [instance]
    target_label: instance
    regex: '(.*):.*'
    replacement: '$1'

💡

If high-cardinality metrics have ever made your monitoring setup messy or expensive, this guide on high cardinality can help you make sense of what to watch out for.

Monitoring Remote Write Performance

Key metrics to monitor:

prometheus_remote_storage_samples_pending: Samples waiting to be sent
prometheus_remote_storage_failed_samples_total: Samples that couldn't be sent
prometheus_remote_storage_sent_batch_duration_seconds: Time to send batches
prometheus_remote_storage_succeeded_samples_total: Successfully sent samples
prometheus_remote_storage_retried_samples_total: Samples that required retries

Create a dedicated dashboard for these metrics to quickly identify issues.

Best Practices for Production Deployments

Architecture and Planning

Start with Clear Requirements:
- Define retention periods for different metric types
- Identify query patterns and performance needs
- Establish SLAs for monitoring availability
Choose the Right Tools:
- Select appropriate remote storage based on scale and query needs
- Consider managed services vs. self-hosted options
- Evaluate cost implications for different retention periods
Design for High Availability:
- Implement redundant Prometheus instances
- Use multiple remote write endpoints for critical metrics
- Plan for failure scenarios with appropriate retention

Configuration and Tuning

Optimize Resource Usage:
- Filter unnecessary metrics using write relabeling
- Use appropriate scrape and evaluation intervals
- Configure queue settings based on load testing
Security Best Practices:
- Use TLS for all remote write connections
- Rotate authentication credentials regularly
- Apply the principle of least privilege for remote write accounts
Monitoring Your Monitoring:
- Set up alerts for remote write failures
- Monitor queue sizes and batch durations
- Create dashboards for remote write performance metrics

💡

If you're already working with Prometheus and want to get more out of your queries, these PromQL tricks might just save you some time (and a few headaches).

Operational Excellence

Documentation and Knowledge Sharing:
- Document your remote write architecture
- Create runbooks for common failure scenarios
- Share best practices across teams
Regular Audits:
- Review what metrics are being sent and their value
- Analyze storage usage and costs
- Identify opportunities for optimization
Continuous Improvement:
- Stay updated with Prometheus and remote storage developments
- Test new features in non-production environments
- Refine your approach based on operational experience

Conclusion

Remote Write is a foundational capability for scaling Prometheus beyond a single instance, enabling enterprises to build comprehensive and resilient observability platforms.

💡

If you still need to discuss some settings, jump onto the Last9 Discord Server to discuss any specifics you need help with. We have a dedicated channel where you can discuss your specific use case with other developers.

FAQs

What does Prometheus remote write do?

Prometheus remote write allows Prometheus to send metrics data in real-time to external storage systems. It enables long-term storage beyond Prometheus' local retention limits, creates high-availability setups, centralizes metrics from multiple Prometheus instances, and integrates with specialized time-series databases optimized for specific workloads.

This capability is essential for proper metrics ingest at scale, particularly when visualizing Prometheus metrics in tools like Grafana.

What is the remote write spec?

The remote write spec is a protocol definition that enables Prometheus to send metrics to compatible external systems. It uses HTTP as the transport layer and Protocol Buffers for efficient data serialization.

The spec defines how metrics, labels, and timestamps are encoded, compressed, and transmitted to ensure compatibility between Prometheus and various storage backends. While the wire format uses Protocol Buffers, you can inspect the data structure in JSON format for debugging purposes.

What is the difference between scrape_interval and evaluation_interval?

scrape_interval: Determines how frequently Prometheus collects metrics from monitored targets. It affects data resolution and storage requirements.
evaluation_interval: Controls how frequently Prometheus evaluates recording and alerting rules. It affects alert responsiveness and rule processing load.

While scrape_interval focuses on data collection, evaluation_interval deals with processing that data through rules. The official Prometheus docs on GitHub provide detailed explanations of these settings and their impact on performance.

What is the difference between Prometheus remote write and federation?

Prometheus remote write pushes metrics to external storage in real-time, while federation pulls selected metrics from other Prometheus servers.

Remote write offers lower latency, complete metrics collection, and is built for high-availability setups. Federation is pull-based, can only collect selected metrics, and is more suitable for hierarchical deployments with limited metric needs.

Many organizations use remote write to ingest Prometheus metrics into cloud platforms like AWS Managed Service for Prometheus or Azure Monitor.

How do I configure Prometheus remote write to send metrics to a remote storage system?

Add a remote_write section to your Prometheus configuration:

remote_write:
  - url: "https://remote-storage-system.example.com/api/v1/write"
    basic_auth:
      username: "prometheus"
      password: "secret"

This configuration sends all metrics to the specified endpoint with authentication. For AWS or Azure cloud environments, you'll typically need to configure specific authentication mechanisms as outlined in their respective docs.

How can I configure Prometheus remote write for high availability?

Configure multiple Prometheus instances to send metrics to the same remote storage:

remote_write:
  - url: "https://remote-storage.example.com/api/v1/write"
    queue_config:
      max_shards: 10  # Increase for higher throughput
      capacity: 20000  # Buffer capacity during outages

Also, add unique external labels to identify the source Prometheus:

global:
  external_labels:
    prometheus_replica: "replica1"
    datacenter: "us-east"

This allows you to maintain consistent metrics ingest even during Prometheus instance failures, ensuring uninterrupted Grafana dashboards.

How do I configure Prometheus remote write for long-term storage?

Configure remote write to send metrics to a storage system designed for long-term retention:

remote_write:
  - url: "https://long-term-storage.example.com/api/v1/write"
    write_relabel_configs:
      - source_labels: [__name__]
        regex: 'important_metric.*'
        action: keep  # Only send important metrics for long-term storage

Consider using filtering to reduce storage costs for long-term retention. AWS Timestream and Azure Data Explorer are popular cloud services for this purpose, offering tiered storage options for cost-effective long-term metrics storage.

How can I configure Prometheus remote write to send data to a specific endpoint?

Specify the exact endpoint URL in your configuration:

remote_write:
  - url: "https://specific-endpoint.example.com/api/v1/write"
    authorization:
      type: Bearer
      credentials: "${BEARER_TOKEN}"  # Using bearer_token authentication
    headers:
      X-Tenant-ID: "tenant123"    # Add any required custom headers

Many endpoints support bearer_token authentication as an alternative to basic auth. The GitHub repository for Prometheus contains extensive documentation on all supported authentication methods.

How can I visualize metrics sent via remote write?

Grafana is the most popular tool for visualizing Prometheus metrics stored in remote write destinations. Configure Grafana to connect to your remote write endpoint:

Add a new data source in Grafana
Select the appropriate data source type (Prometheus, AWS Managed Service for Prometheus, Azure Monitor, etc.)
Configure the connection details, including authentication
Create dashboards that query your metrics using PromQL

Grafana provides pre-built dashboards for common Prometheus metrics that you can import and customize.

Can I use remote write with cloud provider observability solutions?

Yes, major cloud providers support Prometheus remote write:

AWS: AWS Managed Service for Prometheus offers a fully managed Prometheus-compatible monitoring service with remote write endpoints
Azure: Azure Monitor supports Prometheus remote write through its metrics endpoint
Google Cloud: Cloud Monitoring (formerly Stackdriver) provides a Prometheus remote write adapter

Each cloud provider's docs contain specific configuration details for their remote write implementations, including authentication and endpoint formats.

If you're experiencing issues with the remote write protocol:

Enable debug logging in Prometheus to see the data being sent
Use the /debug/pprof/heap endpoint to check for memory issues
Check for JSON parsing errors in your remote write endpoint logs

Examine the Protocol Buffer payloads in JSON format for debugging:

curl -s http://prometheus:9090/api/v1/status/runtimeinfo | jq .

Remember that while you can inspect the protocol data in JSON format, the actual wire format uses Protocol Buffers for efficiency.