OpenTelemetry Filelog Receiver: Collecting Kubernetes Logs

As an engineer who's wrestled with log collection in complex Kubernetes environments, I've found the OpenTelemetry filelog receiver to be a very valuable tool. In this article, I'll share my insights on leveraging this powerful component to streamline log collection in Kubernetes deployments.

OpenTelemetry and Filelog Receiver
Setting Up the Filelog Receiver in Kubernetes
Understand the configuration
Optimizing Performance
Troubleshooting Common Issues
Syslog and Other Log Sources

OpenTelemetry and the Filelog Receiver

OpenTelemetry (OTel) is an open-source observability framework that's gaining traction in the cloud-native world. It provides a unified approach to collecting metrics, traces, and logs, making it a go-to solution for many engineering teams.

The filelog receiver is a key component of the OpenTelemetry Collector, responsible for reading log files and converting them into the OpenTelemetry log format.

Filelog receiver is part of the opentelemetry-collector-contrib repository on GitHub, which houses various community-contributed components for the OpenTelemetry Collector.

📑

For more detailed information about the filelog receiver and its configuration options, refer to our blog on Kubernetes observability with the Opentelemetry Operator

Setting Up the Filelog Receiver in Kubernetes

Let's walk through setting up the filelog receiver to collect logs from a Kubernetes cluster:

Deploy the OpenTelemetry Collector in your Kubernetes cluster using Helm:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts

helm install my-otel-collector open-telemetry/opentelemetry-collector

Configure the filelog receiver using a YAML configuration file:

receivers:
  filelog:
    include: [ /var/log/pods/*/*/*.log ]
    start_at: beginning
    operators:
      - type: regex_parser
        regex: '^(?P<time>\S+) (?P<stream>stdout|stderr) (?P<logtag>\w) (?P<log>.*)$'
        timestamp:
          parse_from: time
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
      - type: json_parser
        parse_from: log
      - type: resource
        attributes:
          - key: k8s.pod.name
            from: resource.attributes["file.name"]
            regex: '/var/log/pods/(?P<namespace>.*)/(?P<pod>.*)/.*\.log'

processors:
  batch:

exporters:
  otlp:
    endpoint: "otel-collector:4317"

service:
  pipelines:
    logs:
      receivers: [filelog]
      processors: [batch]
      exporters: [otlp]

Apply this configuration to your OpenTelemetry Collector deployment:

kubectl apply -f otel-collector-config.yaml

This configuration tells the filelog receiver to read logs from all containers in the Kubernetes cluster, parse the log entries, and extract relevant metadata.

📖

Check out our guide on using kubectl logs to view Kubernetes pod logs.

Understanding the Configuration

Let's break down some key elements:

The include field specifies which log files to read.
start_at: beginning ensures we don't miss any logs.
The operators section defines how we parse and transform the logs.
The resource operator adds Kubernetes metadata to our logs.

Optimizing Performance

To handle high volumes of logs efficiently:

Use the batch processor to reduce API calls to your backend.
Implement a memory_limiter processor to prevent OOM issues:

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 1000
    spike_limit_mib: 200

For very high log volumes, consider using multiple collector instances and sharding your log collection.

Troubleshooting Common Issues

Here are some issues I've encountered and how to resolve them:

Missing logs: Check your include patterns and ensure they match your log file paths.
Parsing errors: Verify your regex patterns using online regex testers with sample log entries.
High CPU usage: Review your operators and consider simplifying complex regex patterns.

Integrating with the OpenTelemetry Ecosystem

The filelog receiver integrates seamlessly with other OpenTelemetry components. I often use it alongside:

The OTLP exporter sends logs to backends like Last9 Levitate, Clickhouse, etc.
Custom processors for data enrichment or filtering
Other receivers for a complete observability solution

📃

Know how to manage Kubernetes costs effectively with OpenCost and Levitate.

Syslog and Other Log Sources

While we've focused on collecting application logs from Kubernetes pods, the OpenTelemetry filelog receiver is versatile enough to handle various log sources. One common log format you might encounter is syslog.

Here is how to configure the filelog receiver to collect syslog messages:

receivers:
   filelog/syslog:
     include: [/var/log/syslog]
     operators:
       - type: add_attributes
         attributes:
           log.source: syslog

This configuration uses the parser operator to extract relevant fields from syslog messages. The `regex_parser` operator, in addition, can be particularly useful for structured logs like syslog, allowing us to parse timestamp, host, program, and message components.

Sending Logs to Last9 Levitate

I've found it very easy to send logs collected by the OpenTelemetry filelog receiver to Last9 Levitate. Here's how to configure it:

Update the exporters section in your OpenTelemetry Collector configuration:

exporters:
  otlp:
    endpoint: "otlp.last9.io:443"
    headers:
      Authorization: "Bearer <your-last9-api-key>"
    tls:
      insecure: false

Ensure your service pipeline uses this exporter:

service:
  pipelines:
    logs:
      receivers: [filelog]
      processors: [batch]
      exporters: [otlp]

Advanced Parser Operator Techniques

The parser operator is a powerful tool in the filelog receiver's arsenal. We've already seen its use with regex parsing, but let's explore some advanced techniques:

Parsing JSON logs:

operators:
+   - type: json_parser
+     parse_from: body
+     timestamp:
+       parse_from: time
+       layout: '%Y-%m-%dT%H:%M:%S.%fZ'

Parsing key-value pairs:

+ operators:
+   - type: regex_parser
+     regex: '(\w+)=("[^"]*"|\S+)'
+     parse_from: body

Using the `log.file.path` attribute:

+ operators:
+   - type: add_attributes
+     attributes:
+       log.file.path: EXPR(attributes["log.file.path"])

These parser operator examples demonstrate how to handle different log formats and extract valuable metadata from your application logs.

How do I copy file.log.name from attributes to resource for filelog receiver?

Use the move operator:

receivers:
  filelog:
    include: [/path/to/your/logs/*.log]
    operators:
      - type: move
        from: attributes["file.name"]
        to: resource["log.file.name"]

Is it possible to assign different labels based on the folder I'm parsing from?

Yes, using the router operator:

receivers:
  filelog:
    include:
      - /path/to/service1/*.log
      - /path/to/service2/*.log
    operators:
      - type: router
        id: route_by_folder
        routes:
          - output: service1_parser
            expr: 'strings.HasPrefix(attributes["file.path"], "/path/to/service1")'
          - output: service2_parser
            expr: 'strings.HasPrefix(attributes["file.path"], "/path/to/service2")'
      - type: add_attributes
        id: service1_parser
        attributes:
          service.name: service1
      - type: add_attributes
        id: service2_parser
        attributes:
          service.name: service2

How can I collect logs from multiple log sources?

To collect logs from multiple sources, you can configure multiple filelog receivers or use include patterns. Here's an example that collects both Kubernetes pod logs and syslog:

receivers:
   filelog/pods:
     include: [/var/log/pods/*/*/*.log]
   filelog/syslog:
     include: [/var/log/syslog]

 service:
   pipelines:
     logs:
       receivers: [filelog/pods, filelog/syslog]
       processors: [batch]
       exporters: [otlp]

This configuration allows you to collect and process logs from different sources, giving you a comprehensive view of your system's behavior.

Why is OpenTelemetry's log data model needed?

The OpenTelemetry log data model provides:

Standardization across different sources and systems (or sinks)
Interoperability with metrics and traces
Vendor neutrality
Rich context through additional attributes

What are OpenTelemetry Logs?

OpenTelemetry Logs are structured log records conforming to the OpenTelemetry log data model. They typically include:

Timestamp
Severity level
Body (the actual log message)
Attributes (key-value pairs for additional context)
Resource information (details about the source of the log)

Why implement this as an operator and not as a processor?

Operators offer:

Granularity: Working at the individual log entry level
Performance: Processing logs as they're read
Flexibility: Building complex processing pipelines within the receiver

How do I configure the filelog receiver to monitor log files from multiple services?

Use multiple filelog receivers:

receivers:
  filelog/service1:
    include: [/path/to/service1/*.log]
    operators:
      - type: add_attributes
        attributes:
          service.name: service1
  filelog/service2:
    include: [/path/to/service2/*.log]
    operators:
      - type: add_attributes
        attributes:
          service.name: service2

service:
  pipelines:
    logs:
      receivers: [filelog/service1, filelog/service2]
      processors: [batch]
      exporters: [otlp]

How do I configure the OpenTelemetry filelog receiver to collect logs from multiple files?

Use the include and exclude fields:

receivers:
  filelog:
    include:
      - /path/to/logs/*.log
      - /another/path/to/logs/*.log
    exclude:
      - /path/to/logs/excluded.log
    start_at: beginning
    operators:
      - type: regex_parser
        regex: '^(?P<time>\S+) (?P<severity>\S+) (?P<message>.*)$'
        timestamp:
          parse_from: time
          layout: '%Y-%m-%d %H:%M:%S'

Conclusion

The OpenTelemetry filelog receiver, part of the opentelemetry-collector-contrib repository, has become an indispensable tool in my Kubernetes observability toolkit. Its flexibility in handling various log sources, powerful parser operators, and seamless integration with the broader OpenTelemetry ecosystem make it a robust solution for log collection in complex, containerized environments.

Remember, observability is a journey, not a destination. Whether you're dealing with application logs, syslog, or other log sources, keep experimenting, optimizing, and sharing your experiences with the community. The open-source nature of OpenTelemetry means we all benefit from each other's learnings and contributions.

Integrating the OpenTelemetry filelog receiver with backends such as Last9 Levitate creates a robust observability pipeline, offering valuable insights into your Kubernetes applications. Make sure to keep your API keys secure and periodically review your log collection setup to capture all the essential data.

Happy logging, and may your systems always be observable!

📉

Get started with Levitate! Know more about how we unlocked high cardinality monitoring for live streaming giants with 40M+ concurrent users! Schedule a demo with us!

OpenTelemetry Filelog Receiver: Collecting Kubernetes Logs

Contents

Table of Contents

OpenTelemetry and the Filelog Receiver

Setting Up the Filelog Receiver in Kubernetes

Understanding the Configuration

Optimizing Performance

Troubleshooting Common Issues

Integrating with the OpenTelemetry Ecosystem

Syslog and Other Log Sources

Sending Logs to Last9 Levitate

Advanced Parser Operator Techniques

How do I copy file.log.name from attributes to resource for filelog receiver?

Is it possible to assign different labels based on the folder I'm parsing from?

How can I collect logs from multiple log sources?

Why is OpenTelemetry's log data model needed?

What are OpenTelemetry Logs?

Why implement this as an operator and not as a processor?

How do I configure the filelog receiver to monitor log files from multiple services?

How do I configure the OpenTelemetry filelog receiver to collect logs from multiple files?

Conclusion

Contents

Do More with Less

Handcrafted Related Posts

Instrumenting AWS Lambda Functions with OpenTelemetry

Log Format Standards: JSON, XML, and Key-Value Explained

Why Your Loki Metrics Are Disappearing (And How to Fix It)

OpenTelemetry Filelog Receiver: Collecting Kubernetes Logs

Contents

Table of Contents

OpenTelemetry and the Filelog Receiver

Setting Up the Filelog Receiver in Kubernetes

Understanding the Configuration

Optimizing Performance

Troubleshooting Common Issues

Integrating with the OpenTelemetry Ecosystem

Syslog and Other Log Sources

Sending Logs to Last9 Levitate

Advanced Parser Operator Techniques

FAQs related to OpenTelemetry Filelog Receiver

How do I copy file.log.name from attributes to resource for filelog receiver?

Is it possible to assign different labels based on the folder I'm parsing from?

How can I collect logs from multiple log sources?

Why is OpenTelemetry's log data model needed?

What are OpenTelemetry Logs?

Why implement this as an operator and not as a processor?

How do I configure the filelog receiver to monitor log files from multiple services?

How do I configure the OpenTelemetry filelog receiver to collect logs from multiple files?

Conclusion

Contents

Do More with Less

Handcrafted Related Posts

Instrumenting AWS Lambda Functions with OpenTelemetry

Log Format Standards: JSON, XML, and Key-Value Explained

Why Your Loki Metrics Are Disappearing (And How to Fix It)