In today's microservices world with so many moving parts, observability has become a crucial aspect of maintaining and optimizing complex software systems.
OpenTelemetry, an open-source observability framework, has emerged as a powerful tool for collecting, processing, and exporting telemetry data. However, with great power comes great responsibility, especially when it comes to handling sensitive information. This is where redaction in OpenTelemetry Collector plays a vital role.
Redaction is the process of removing or masking sensitive data before it's stored or transmitted. In the context of OpenTelemetry, redaction ensures that potentially sensitive information is not inadvertently exposed through logs, metrics, or traces.
This article will explore the ins and outs of implementing redaction in OpenTelemetry Collector, helping you balance the need for comprehensive observability with data privacy and security requirements.
Understanding Redaction in OpenTelemetry Collector
What is Redaction?
Redaction in OpenTelemetry Collector refers to automatically removing or masking sensitive data from telemetry signals before they are exported or stored.
This process is crucial for maintaining data privacy, complying with regulations like GDPR or CCPA, and preventing the exposure of confidential information.
Why is Redaction Important?
- Data Privacy: Protects personal identifiable information (PII) and other sensitive data from unauthorized access.
- Compliance: Helps organizations meet regulatory requirements for data protection.
- Security: Reduces the risk of exposing sensitive information that malicious actors could exploit.
- Trust: Builds confidence among users and stakeholders that their data is being handled responsibly.
Types of Data That Can Be Redacted
OpenTelemetry Collector can redact various types of data across different telemetry signals:
Logs:
- User credentials
- API keys
- Session tokens
- Personal information (e.g., email addresses, phone numbers)
Metrics:
- Customer-specific identifiers in metric labels
- Sensitive business metrics
Traces:
- User IDs in span attributes
- Sensitive payload information in HTTP requests
- Database queries containing personal data
ℹ️
A super easy way is to leverage
Last9’s Sensitive Data Scanner for Open Telemetry Data, which provides an amazing developer experience to do this without any setup.
Implementing Redaction in OpenTelemetry Collector
Implementing redaction in OpenTelemetry Collector involves configuring processors that modify the telemetry data as it passes through the collector pipeline. Let's walk through a step-by-step guide on how to set up redaction.
Step 1: Choose the Appropriate Processor
OpenTelemetry Collector offers several processors that can be used for redaction:
- Attribute Processor: Modifies or removes attributes from spans, logs, or metrics.
- Filter Processor: Filters out entire data points based on certain conditions.
- Transform Processor: Applies complex transformations to telemetry data.
For this guide, we'll focus on the Attribute Processor, commonly used for redaction tasks.
Add the following configuration to your OpenTelemetry Collector config file:
processors:
attributes:
actions:
- key: http.request.header.authorization
action: delete
- key: db.statement
action: hash
- key: email
action: regex
regex: '^(.+)@(.+)$'
replacement: '${1}@*****'
This configuration does the following:
- Deletes the
authorization
header from HTTP requests - Hashes the
db.statement
attribute to protect sensitive queries - Masks email addresses by replacing the domain with asterisks
Step 3: Include the Processor in Your Pipeline
Ensure that the attribute processor is included in your pipeline configuration:
service:
pipelines:
traces:
receivers: [otlp]
processors: [attributes]
exporters: [otlp]
logs:
receivers: [otlp]
processors: [attributes]
exporters: [otlp]
Step 4: Test Your Configuration
After implementing the redaction rules, testing your configuration to ensure it's working as expected is crucial. Send some sample data through your collector and verify that sensitive information is being properly redacted.
Best Practices for Redaction
- Identify Sensitive Data: Thoroughly audit your telemetry data to identify all potential sources of sensitive information.
- Use Least Privilege: Only collect and retain the minimum amount of data necessary for your observability needs.
- Regular Reviews: Periodically review and update your redaction rules as your system evolves.
- Balance Utility and Privacy: Ensure that redaction doesn't overly compromise the usefulness of your telemetry data.
- Consistent Approach: Apply redaction consistently across all relevant telemetry signals (logs, metrics, and traces).
With these steps and best practices, you can effectively implement redaction in the OpenTelemetry Collector configuration, ensuring that your observability data remains both useful and secure.
Common Redaction Use Cases Across Different Industries
E-Commerce/Online Stores:
- Credit card numbers in log entries
- Customer email addresses in trace spans
- Delete billing address
processors:
attributes:
actions:
- key: credit_card
action: regex
regex: '\d{4}-\d{4}-\d{4}-(\d{4})'
replacement: '****-****-****-$1'
- key: email
action: regex
regex: '^(.)(.*?)(@.*)$'
replacement: '$1****$3'
- key: billing_address
action: delete
This configuration masks all but the last four digits of credit card numbers obscures email addresses while preserving the domain, and completely removes billing addresses from telemetry data.
Healthcare Systems
- Patient names and identifiers
- Medical record numbers
- Diagnostic codes in certain contexts
processors:
attributes:
actions:
- key: patient_name
action: delete
- key: medical_record_number
action: hash
- key: diagnostic_code
action: route
operations:
- from_attribute: context.sensitive
action: delete
This setup removes patient names entirely, hashes medical record numbers for traceability without exposure, and selectively deletes diagnostic codes when a ‘sensitive’ context flag is set.
Financial Services
- Account numbers
- Transaction amounts above a certain threshold
- User session tokens
processors:
attributes:
actions:
- key: account_number
action: regex
regex: '(\d{4})\d{8}(\d{4})'
replacement: '$1********$2'
- key: transaction_amount
action: route
operations:
- from_attribute: amount
compare_operation: '>'
value: 10000
action: replace
replace_with: 'high-value-transaction'
- key: session_token
action: delete
This configuration masks the middle digits of account numbers, replaces high-value transaction amounts with a generic label and removes session tokens to prevent potential session hijacking.
Using these redaction strategies, developers can maintain detailed observability while ensuring compliance with data protection regulations and maintaining user privacy.
The flexibility of OpenTelemetry Collector's processors allows for fine-grained control over what data is redacted and how, enabling developers to strike the right balance between insight and data protection, compliance.
Feel free to chat with us on our Discord or book a demo to understand more about Last9!