Aug 13th, ‘24/4 min read

Redacting Sensitive Data in OpenTelemetry Collector

This guide covers types of data that can be redacted and step-by-step instructions for configuring the Attribute Processor.

Redacting Sensitive Data in OpenTelemetry Collector

In today's microservices world with so many moving parts, observability has become a crucial aspect of maintaining and optimizing complex software systems.

OpenTelemetry, an open-source observability framework, has emerged as a powerful tool for collecting, processing, and exporting telemetry data. However, with great power comes great responsibility, especially when it comes to handling sensitive information. This is where redaction in OpenTelemetry Collector plays a vital role.

Redaction is the process of removing or masking sensitive data before it's stored or transmitted. In the context of OpenTelemetry, redaction ensures that potentially sensitive information is not inadvertently exposed through logs, metrics, or traces.

This article will explore the ins and outs of implementing redaction in OpenTelemetry Collector, helping you balance the need for comprehensive observability with data privacy and security requirements.

Understanding Redaction in OpenTelemetry Collector

What is Redaction?

Redaction in OpenTelemetry Collector refers to automatically removing or masking sensitive data from telemetry signals before they are exported or stored.

This process is crucial for maintaining data privacy, complying with regulations like GDPR or CCPA, and preventing the exposure of confidential information.

Why is Redaction Important?

  1. Data Privacy: Protects personal identifiable information (PII) and other sensitive data from unauthorized access.
  2. Compliance: Helps organizations meet regulatory requirements for data protection.
  3. Security: Reduces the risk of exposing sensitive information that malicious actors could exploit.
  4. Trust: Builds confidence among users and stakeholders that their data is being handled responsibly.

Types of Data That Can Be Redacted

OpenTelemetry Collector can redact various types of data across different telemetry signals:

Logs:

    • User credentials
    • API keys
    • Session tokens
    • Personal information (e.g., email addresses, phone numbers)

Metrics:

    • Customer-specific identifiers in metric labels
    • Sensitive business metrics

Traces:

    • User IDs in span attributes
    • Sensitive payload information in HTTP requests
    • Database queries containing personal data
ℹ️
A super easy way is to leverage Last9’s Sensitive Data Scanner for Open Telemetry Data, which provides an amazing developer experience to do this without any setup.

Implementing Redaction in OpenTelemetry Collector

Implementing redaction in OpenTelemetry Collector involves configuring processors that modify the telemetry data as it passes through the collector pipeline. Let's walk through a step-by-step guide on how to set up redaction.

Step 1: Choose the Appropriate Processor

OpenTelemetry Collector offers several processors that can be used for redaction:

  1. Attribute Processor: Modifies or removes attributes from spans, logs, or metrics.
  2. Filter Processor: Filters out entire data points based on certain conditions.
  3. Transform Processor: Applies complex transformations to telemetry data.

For this guide, we'll focus on the Attribute Processor, commonly used for redaction tasks.

Step 2: Configure the Attribute Processor

Add the following configuration to your OpenTelemetry Collector config file:

processors:
  attributes:
    actions:
      - key: http.request.header.authorization
        action: delete
      - key: db.statement
        action: hash
      - key: email
        action: regex
        regex: '^(.+)@(.+)$'
        replacement: '${1}@*****'

This configuration does the following:

  • Deletes the authorization header from HTTP requests
  • Hashes the db.statement attribute to protect sensitive queries
  • Masks email addresses by replacing the domain with asterisks

Step 3: Include the Processor in Your Pipeline

Ensure that the attribute processor is included in your pipeline configuration:

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [attributes]
      exporters: [otlp]
    logs:
      receivers: [otlp]
      processors: [attributes]
      exporters: [otlp]

Step 4: Test Your Configuration

After implementing the redaction rules, testing your configuration to ensure it's working as expected is crucial. Send some sample data through your collector and verify that sensitive information is being properly redacted.

Best Practices for Redaction

  1. Identify Sensitive Data: Thoroughly audit your telemetry data to identify all potential sources of sensitive information.
  2. Use Least Privilege: Only collect and retain the minimum amount of data necessary for your observability needs.
  3. Regular Reviews: Periodically review and update your redaction rules as your system evolves.
  4. Balance Utility and Privacy: Ensure that redaction doesn't overly compromise the usefulness of your telemetry data.
  5. Consistent Approach: Apply redaction consistently across all relevant telemetry signals (logs, metrics, and traces).

With these steps and best practices, you can effectively implement redaction in the OpenTelemetry Collector configuration, ensuring that your observability data remains both useful and secure.

Common Redaction Use Cases Across Different Industries

E-Commerce/Online Stores:

    1. Credit card numbers in log entries
    2. Customer email addresses in trace spans
    3. Delete billing address
processors:
  attributes:
    actions:
      - key: credit_card
        action: regex
        regex: '\d{4}-\d{4}-\d{4}-(\d{4})'
        replacement: '****-****-****-$1'
      - key: email
        action: regex
        regex: '^(.)(.*?)(@.*)$'
        replacement: '$1****$3'
      - key: billing_address
        action: delete

This configuration masks all but the last four digits of credit card numbers obscures email addresses while preserving the domain, and completely removes billing addresses from telemetry data.

Healthcare Systems

    1. Patient names and identifiers
    2. Medical record numbers
    3. Diagnostic codes in certain contexts
processors:
  attributes:
    actions:
      - key: patient_name
        action: delete
      - key: medical_record_number
        action: hash
      - key: diagnostic_code
        action: route
        operations:
          - from_attribute: context.sensitive
            action: delete

This setup removes patient names entirely, hashes medical record numbers for traceability without exposure, and selectively deletes diagnostic codes when a ‘sensitive’ context flag is set.

Financial Services

    1. Account numbers
    2. Transaction amounts above a certain threshold
    3. User session tokens
processors:
  attributes:
    actions:
      - key: account_number
        action: regex
        regex: '(\d{4})\d{8}(\d{4})'
        replacement: '$1********$2'
      - key: transaction_amount
        action: route
        operations:
          - from_attribute: amount
            compare_operation: '>'
            value: 10000
            action: replace
            replace_with: 'high-value-transaction'
      - key: session_token
        action: delete

This configuration masks the middle digits of account numbers, replaces high-value transaction amounts with a generic label and removes session tokens to prevent potential session hijacking.

Using these redaction strategies, developers can maintain detailed observability while ensuring compliance with data protection regulations and maintaining user privacy.

The flexibility of OpenTelemetry Collector's processors allows for fine-grained control over what data is redacted and how, enabling developers to strike the right balance between insight and data protection, compliance.


Feel free to chat with us on our Discord or book a demo to understand more about Last9!

Newsletter

Stay updated on the latest from Last9.

Handcrafted Related Posts