Grok is one of the most useful filters in Logstash, turning unstructured log data into structured, queryable information. It works by matching patterns against your logs and extracting information to fields you can use. Consider Grok as a way to teach Logstash how to read your messy logs and organize them neatly.
For DevOps professionals, Grok is the secret weapon that makes the Elastic Stack truly powerful for log analysis. It uses a combination of named regular expressions to parse logs into something meaningful that you can search, filter, and visualize in Kibana.
How Grok Pattern Syntax Works in Detail
Grok patterns follow this syntax:
%{PATTERN:field_name}
Where:
PATTERN
is a predefined pattern (like IP, NUMBER, or WORD)field_name
is what you want to name the extracted data
You can also convert the data type during extraction by adding a data type:
%{PATTERN:field_name:data_type}
Where data_type
can be int
, float
, or boolean
.
Logstash comes with over 120 patterns built-in, saving you from writing complex regex from scratch. These patterns are stored in the /logstash/vendor/bundle/jruby/x.x/gems/logstash-patterns-core-x.x.x/patterns
directory.
Essential Logstash Grok Examples for Common Log Formats
Parsing Apache Access Logs with Built-in Patterns
Apache logs are some of the most common logs you'll work with. Here's how to parse a standard Apache log format:
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
One line of code handles all this parsing! The COMBINEDAPACHELOG
pattern extracts these fields:
clientip
: The IP address of the clientident
: The identity information provided by the client (usually "-")auth
: The user authentication informationtimestamp
: When the request was receivedverb
: The HTTP method (GET, POST, etc.)request
: The requested resource pathhttpversion
: HTTP versionresponse
: HTTP response codebytes
: Size of the response in bytesreferrer
: The referring URLagent
: The user agent string
Creating Custom Patterns for Web Server Logs
For custom formats, you can build your own pattern:
filter {
grok {
match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes:int} %{NUMBER:duration:float}" }
}
}
This would match logs like:
55.3.244.1 GET /index.html 15824 0.043
And extract these fields:
client
: The IP addressmethod
: The HTTP method usedrequest
: The requested pathbytes
: Size in bytes (converted to integer)duration
: Request processing time (converted to float)
Comprehensive Reference of Grok Patterns for DevOps Workflows
Below are some patterns you'll use constantly in your DevOps work:
Pattern | Description | Example Match | Common Usage |
---|---|---|---|
IP | IPv4 address | 192.168.1.1 | Client IPs, server addresses |
HOSTNAME | Host name | server-01.example.com | Server identification |
TIMESTAMP_ISO8601 | ISO8601 timestamp | 2023-04-10T13:25:00.123Z | Modern application logs |
HTTPDATE | HTTP date format | 01/Jan/2023:13:25:15 +0100 | Web server logs |
NUMBER | Any number | 12345 | Response times, status codes |
INT | Integer | 12345 | Counts, durations |
WORD | A word (letters, numbers, underscore) | server_01 | Service names, log levels |
GREEDYDATA | Everything until the end of line | any text here... | Message content |
DATA | Non-greedy capture | some text | Specific field extraction |
QUOTEDSTRING | String inside quotes | "example text" | JSON values, parameters |
LOGLEVEL | Log levels | INFO, ERROR, DEBUG | Application log severity |
UUID | Universal unique identifier | 5c2c2698-c2c8-4c3e-aab6-74c046cb719f | Request IDs, trace IDs |
Effective Techniques for Debugging Grok Pattern Matches
When your Grok pattern isn't matching as expected (and this happens to everyone), try these steps:
- Use the Grok Debugger: Test your patterns with the Grok Debuggerthe or Kibana's built-in Grok Debugger in the Dev Tools section.
- Try multiple patterns with debugging enabled:
filter {
grok {
match => { "message" => "%{PATTERN1}" }
tag_on_failure => ["pattern1_failed"]
add_field => { "matched_by" => "pattern1" }
break_on_match => false
}
grok {
match => { "message" => "%{PATTERN2}" }
tag_on_failure => ["pattern2_failed"]
add_field => { "matched_by" => "pattern2" }
break_on_match => false
}
}
- Check for grok failures: Use the
_grokparsefailure
tag to identify logs that didn't match.
filter {
grok {
match => { "message" => "%{PATTERN1}" }
tag_on_failure => ["_grokparsefailure"]
}
}
Then search for that tag in Kibana to find problematic logs:
tags:_grokparsefailure
- Use grok pattern dump: Output all available patterns for reference:
bin/logstash -e 'filter { grok { match => { "message" => "" } } }' --config.reload.automatic
Strategies for Processing Multi-line Log Formats
Many applications produce multi-line logs, like Java stack traces. Here's how to handle them:
filter {
multiline {
pattern => "^%{TIMESTAMP_ISO8601}"
negate => true
what => "previous"
}
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:msg}" }
}
}
This configuration:
- Combines related log lines that don't start with a timestamp
- Joins them with the previous line that had a timestamp
- Then extracts fields from the combined message
Java Exception Stack Trace Example
For Java stack traces, you might use:
filter {
multiline {
pattern => "^[\\t ]"
what => "previous"
}
multiline {
pattern => "^[a-zA-Z#]"
negate => true
what => "previous"
}
grok {
match => {
"message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} \[%{DATA:thread}\] %{JAVACLASS:class}: %{GREEDYDATA:message}"
}
}
}
This handles indentation and continuation lines in Java exceptions.
Advanced Grok Pattern Techniques for Complex Logs
Creating and Using Custom Pattern Definitions
You can define your own patterns for reuse:
filter {
grok {
pattern_definitions => {
"APPID" => "[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12}"
"CUSTOM_TIMESTAMP" => "%{YEAR}-%{MONTHNUM}-%{MONTHDAY}T%{HOUR}:%{MINUTE}:%{SECOND},%{INT}"
}
match => {
"message" => "%{CUSTOM_TIMESTAMP:timestamp} %{APPID:application_id} %{GREEDYDATA:message}"
}
}
}
Implementing Conditional Pattern Matching for Different Log Types
Different log types? No problem:
filter {
if [source] == "api-server" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
} else if [source] == "database" {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:msg}" }
}
} else if [source] =~ /^app-\d+$/ {
grok {
match => { "message" => "%{DATA:service}\[%{NUMBER:pid}\]: \[%{WORD:loglevel}\] %{GREEDYDATA:msg}" }
}
}
}
Using Oniguruma Regular Expressions for Complex Pattern Matching
For more complex matching, you can use inline regex patterns:
filter {
grok {
match => {
"message" => "(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}) (?<component>%{WORD}):%{SPACE}(?<level>%{LOGLEVEL}): (?:%{SPACE}\[(?<thread>[^\]]+)\]:)? (?<message>.*)"
}
}
}
Grok Examples for Popular DevOps Tools
Kubernetes Container Log Pattern Extraction
filter {
grok {
match => {
"message" => "%{TIMESTAMP_ISO8601:timestamp} %{WORD:level} \[%{WORD:component}\] \[%{WORD:namespace}/%{WORD:pod}/%{WORD:container}\] %{GREEDYDATA:msg}"
}
}
date {
match => [ "timestamp", "ISO8601" ]
target => "@timestamp"
}
mutate {
add_field => {
"kubernetes.namespace" => "%{namespace}"
"kubernetes.pod" => "%{pod}"
"kubernetes.container" => "%{container}"
}
remove_field => [ "namespace", "pod", "container" ]
}
}
HAProxy Load Balancer Log Processing Pattern
filter {
grok {
match => {
"message" => "%{IP:client_ip}:%{NUMBER:client_port} \[%{HTTPDATE:timestamp}\] %{WORD:frontend_name} %{WORD:backend_name}/%{WORD:server_name} %{NUMBER:time_request:float}/%{NUMBER:time_queue:float}/%{NUMBER:time_backend_connect:float}/%{NUMBER:time_backend_response:float}/%{NUMBER:time_duration:float} %{NUMBER:http_status_code:int} %{NUMBER:bytes_read:int} %{DATA:captured_request_cookie} %{DATA:captured_response_cookie} %{WORD:termination_state} %{NUMBER:actconn:int}/%{NUMBER:feconn:int}/%{NUMBER:beconn:int}/%{NUMBER:srvconn:int}/%{NUMBER:retries:int} %{NUMBER:srv_queue:int}/%{NUMBER:backend_queue:int} \{%{DATA:request_headers}\} \{%{DATA:response_headers}\} \"%{WORD:http_verb} %{NOTSPACE:http_request} HTTP/%{NUMBER:http_version}\""
}
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
target => "@timestamp"
}
mutate {
convert => {
"time_request" => "float"
"time_queue" => "float"
"time_backend_connect" => "float"
"time_backend_response" => "float"
"time_duration" => "float"
"http_status_code" => "integer"
"bytes_read" => "integer"
}
}
}
Jenkins Build Log Pattern Extraction
filter {
grok {
match => {
"message" => "\[%{TIMESTAMP_ISO8601:timestamp}\] %{WORD:level}: %{GREEDYDATA:msg}"
}
}
if [msg] =~ "^Started by " {
grok {
match => { "msg" => "^Started by (?<build_trigger>.*)" }
tag_on_failure => []
}
} else if [msg] =~ "^Building " {
grok {
match => { "msg" => "^Building (?<build_status>.*)" }
tag_on_failure => []
}
} else if [msg] =~ "^Finished: " {
grok {
match => { "msg" => "^Finished: (?<build_result>.*)" }
tag_on_failure => []
}
}
}
Nginx Access Log Pattern Matching
filter {
grok {
match => {
"message" => '%{IPORHOST:remote_ip} - %{DATA:user_name} \[%{HTTPDATE:time}\] "%{WORD:method} %{DATA:url} HTTP/%{NUMBER:http_version}" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} "%{DATA:referrer}" "%{DATA:agent}" "%{DATA:forwarded_for}" %{NUMBER:request_length} %{NUMBER:request_time} \[%{DATA:proxy_upstream_name}\] \[%{DATA:upstream_addr}\] %{NUMBER:upstream_response_length} %{NUMBER:upstream_response_time} %{NUMBER:upstream_status} %{DATA:req_id}'
}
}
}
Performance Optimization Strategies for Grok Pattern Matching
Grok is powerful but can be CPU-intensive. Keep these tips in mind:
- Use specific patterns: The more specific your pattern, the faster it matches. Don't use
.*
or%{GREEDYDATA}
when you can use a more specific pattern. - Limit named captures: Each named capture creates a field and consumes memory. Only capture what you need.
- Order patterns by frequency: List the most common patterns first for better performance.
filter {
grok {
match => {
"message" => [
"%{PATTERN1}", # Matches 80% of logs
"%{PATTERN2}", # Matches 15% of logs
"%{PATTERN3}" # Matches 5% of logs
]
}
break_on_match => true
}
}
- Use break_on_match wisely: Set it to
true
for mutually exclusive patterns andfalse
when you want to apply multiple patterns. - Avoid backtracking: Complex regex with lots of optional parts can cause backtracking, which hurts performance.
- Use anchors: Start patterns with
^
when possible to anchor to the start of the line. - Pre-filter large datasets: Use simple patterns first to filter the dataset.
filter {
if [message] =~ "ERROR" {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} ERROR %{GREEDYDATA:error_message}" }
}
}
}
Integrating Grok with Other Logstash Filters for Complete Log Processing
Sometimes you need more than pattern matching. Combine Grok with other filters:
filter {
# Parse the log with Grok
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:msg}" }
}
# Convert timestamp string to a proper date
date {
match => [ "timestamp", "ISO8601" ]
target => "@timestamp"
}
# Convert field types
mutate {
convert => {
"duration" => "float"
"response_size" => "integer"
"status_code" => "integer"
}
}
# Add geo information for IP addresses
geoip {
source => "clientip"
target => "geo"
}
# Parse JSON in the message field
if [msg] =~ /^\{.*\}$/ {
json {
source => "msg"
target => "msg_json"
}
}
# Drop sensitive information
mutate {
remove_field => ["password", "credit_card", "auth_token"]
}
}
Building a Complete Logstash Pipeline with Grok Patterns
Here's how to put it all together in a complete pipeline:
input {
file {
path => "/var/log/application/*.log"
start_position => "beginning"
sincedb_path => "/var/lib/logstash/sincedb"
type => "application"
}
beats {
port => 5044
type => "beats"
}
}
filter {
if [type] == "application" {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{WORD:service}\] %{LOGLEVEL:level}: %{GREEDYDATA:msg}" }
}
date {
match => [ "timestamp", "ISO8601" ]
target => "@timestamp"
}
} else if [type] == "beats" and [fields][log_type] == "nginx" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
if "_grokparsefailure" in [tags] {
mutate {
add_field => { "parsing_error" => "true" }
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "%{type}-%{+YYYY.MM.dd}"
}
if "_grokparsefailure" in [tags] {
file {
path => "/var/log/logstash/failed_events.log"
}
}
}
Conclusion
Grok is an essential tool in any DevOps professional's ELK Stack toolkit. It transforms chaotic logs into structured data that you can analyze, alert on, and visualize.
The key to mastering Grok is practice – start with the examples in this guide, adapt them to your specific log formats, and gradually build a library of patterns that work for your infrastructure.
FAQs
What's the difference between Grok and regular expressions?
Grok is built on top of regular expressions but makes them more reusable and readable. Instead of writing a complex regex like (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
for an IP address, you can simply use %{IP:client_ip}
. Grok provides a library of predefined patterns that you can combine and reference by name.
How do I handle logs with variable structures?
For logs with variable structures, use multiple pattern matching with the break_on_match
option set to true:
filter {
grok {
match => {
"message" => [
"%{PATTERN1}", # For log type A
"%{PATTERN2}", # For log type B
"%{PATTERN3}" # For log type C
]
}
break_on_match => true
}
}
This tries each pattern in order until one matches.
Can I use Grok patterns with JSON logs?
Yes, but it's often better to use the JSON filter for fully structured JSON logs:
filter {
json {
source => "message"
}
}
Use Grok when you have mixed formats or need to extract JSON from within a larger log entry:
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:json_data}" }
}
json {
source => "json_data"
target => "parsed_data"
}
}
How can I handle custom date formats with Grok?
First extract the date string with Grok, then use the date filter to parse it:
filter {
grok {
match => { "message" => "%{CUSTOM_DATE_PATTERN:timestamp} %{GREEDYDATA:msg}" }
}
date {
match => [ "timestamp", "yyyy/MM/dd HH:mm:ss.SSS" ]
target => "@timestamp"
}
}
What should I do when a log format changes?
When log formats change:
- Create a new Grok pattern for the new format
- Use conditional matching to apply different patterns based on log characteristics
- Consider using version tags in your fields (e.g., message_v1, message_v2)
filter {
if [message] =~ "new_format_indicator" {
grok { match => { "message" => "%{NEW_PATTERN}" } }
} else {
grok { match => { "message" => "%{OLD_PATTERN}" } }
}
}
How do I troubleshoot Grok pattern performance issues?
- Replace complex patterns with simpler ones when possible
- Use the Grok Debugger to test and optimize patterns before deployment
Use the --profile
flag to benchmark your pipeline
bin/logstash --config.test_and_exit --path.settings=/etc/logstash -f /etc/logstash/conf.d/my_config.conf --profile
Enable Logstash slow log by adding this to logstash.yml:yaml
slowlog.threshold.warn: 2s
slowlog.threshold.info: 1s
slowlog.threshold.debug: 500ms
slowlog.threshold.trace: 100ms
How do I share Grok patterns across multiple Logstash instances?
Store common patterns in a centralized location:
Reference these patterns in your Logstash configuration:
filter {
grok {
patterns_dir => ["/etc/logstash/patterns"]
match => { "message" => "%{APP_LOG_FORMAT}" }
}
}
Add your patterns to files in this directory:
# /etc/logstash/patterns/custom_patterns
CUSTOM_DATE_FORMAT %{YEAR}[/-]%{MONTHNUM}[/-]%{MONTHDAY} %{TIME}
APP_LOG_FORMAT \[%{TIMESTAMP_ISO8601:timestamp}\] %{LOGLEVEL:level} %{NOTSPACE:logger} - %{GREEDYDATA:message}
Create a patterns directory:
mkdir -p /etc/logstash/patterns
Can Grok extract nested fields?
Yes, you can extract nested fields with dot notation or by combining Grok with the mutate filter:
filter {
grok {
match => { "message" => "%{IP:client.ip} %{WORD:client.method} %{PATH:client.request}" }
}
}
Or:
filter {
grok {
match => { "message" => "%{IP:clientip} %{WORD:method} %{PATH:request}" }
}
mutate {
add_field => {
"[client][ip]" => "%{clientip}"
"[client][method]" => "%{method}"
"[client][request]" => "%{request}"
}
remove_field => ["clientip", "method", "request"]
}
}
How can I validate my Grok patterns before deploying to production?
- Use Kibana's Grok Debugger in Dev Tools
- Set up a staging Logstash instance to validate patterns with real traffic
Test with a small sample of logs first:
input {
file {
path => "/path/to/sample/logs"
start_position => "beginning"
sincedb_path => "/dev/null" # Don't track position for tests
}
}
Use the --config.test_and_exit
flag:
bin/logstash --config.test_and_exit -f your_config.conf