Converting logs to metrics is an excellent strategy for faster querying and alerting, as metrics are lightweight compared to raw logs. Additionally, metrics provide insights such as trends and patterns compared to logs to monitor systems in real-time.
This blog will introduce using Vector to convert logs to metrics and use an external long-term metric storage such as Levitate to store and query the converted metrics.
What is Vector?
Vector is a tool used to build observability pipelines. Given the versatility of this tool, it can perform various ETL operations on your observability data. This blog will focus on transforming application logs, converting them to metrics, and storing the metric data in Levitate.
Vector Topology
Vector configuration is declarative; we can describe how logs to metrics processing will work within the config file. Configuration is split into three layers and needs to be chained using named references.
Application instances typically write logs via STDOUT to either an Object Store or a file system or emit logs to STDOUT. We have used File System as the source to read logs from in this blog.
Here is what our raw logs look like, which get stored in the File System as gzipped JSON files. These logs are sample nginx ingress logs from a Kubernetes cluster.
This allows us to process all the logs sourced from the file system to parse them to a desired output. In our example, given the nginx logs in JSON format, we will parse these and extract critical attributes from the log line. Some key attributes we would need as part of the labels for the metric are path, method, app_name, response_status, org_id, etc…
Vector provides us with VRL, i.e., Vector Remap Language. Using VRL, we can filter out the desired critical attributes from the log lines and use them to construct our metric. We have written the below VRL code for our example.
This is what the Vector event looks like after the JSON transformation. Observe how only the declared attributes of the log line have been processed, including the URL Query Params as JSON attributes.
This transformation allows us to derive metrics from the above-transformed vector event. We can create metrics in OpenMetrics format by declaring the type of metrics, such as counter, histogram, gauge, etc. The input key in the config indicates which layer to read data from; in this case, it is the json_transform layer. Below is the example config.
metric_nginx:
type: log_to_metric
inputs: # A list of upstream source or transform IDs.
- json_transform
metrics:
- type: counter
field: app_name # Field value needs to be a float if histogram else any label that is consistent across will do
kind: incremental
name: http_requests_count
timestamp: "{{timestamp}}" # This timestamp value is obtained from the vector event and not from the source. Source timestamps can be obtained by transforming them in json_transform
tags:
status: "{{response_status}}"
host: "{{host}}"
org_id: "{{org_id}}"
path: "{{path}}"
proxy_upstream_name: "{{proxy_upstream_name}}"
app_name: "{{app_name}}"
method: "{{method}}"
This creates a vector event with the declared metric name ashttp_requests_count, metric type as a counter with a timestamp, and tags as its label sets.
Levitate - our managed time series data warehouse can store metrics converted from your logs so that you can extract knowledge from it. Book a demo today to learn more.
Vector Sink Layer Config
This layer dictates the delivery destination of our transformed data. In this case, metric data in OpenMetrics format should be written to a remote TSDB, i.e., Levitate. Our preferred sink type here is Prometheus Remote Write. Below is the example config.
If a large amount of data is being converted into metrics, the Vector to Remote Write Flow described above may run into challenges because of the following issues.
No support for buffering at the Vector level
Fire and Forget nature of requests
Data loss if remote TSDB is unavailable
The alternative here would be to opt for remote writing using an external scraper such as Prometheus Agent or Vmagent and let Vector expose the metrics on /metrics port.
Using the sink configuration below, Vector exposes metrics on a given port. Using a scraper, we can scrape Vector as a target and then remote write it to a TSDB.
sinks:
levitate:
type: prometheus_exporter
inputs:
- metric_nginx
address: 0.0.0.0:9598 # Port to scrape
flush_period_secs: 60
Finally, our exposed metric data per log line looks like this.