Logstash Grok Examples: A Detailed Guide to Pattern Matching

Grok is one of the most useful filters in Logstash, turning unstructured log data into structured, queryable information. It works by matching patterns against your logs and extracting information to fields you can use. Consider Grok as a way to teach Logstash how to read your messy logs and organize them neatly.

For DevOps professionals, Grok is the secret weapon that makes the Elastic Stack truly powerful for log analysis. It uses a combination of named regular expressions to parse logs into something meaningful that you can search, filter, and visualize in Kibana.

How Grok Pattern Syntax Works in Detail

Grok patterns follow this syntax:

%{PATTERN:field_name}

Where:

PATTERN is a predefined pattern (like IP, NUMBER, or WORD)
field_name is what you want to name the extracted data

You can also convert the data type during extraction by adding a data type:

%{PATTERN:field_name:data_type}

Where data_type can be int, float, or boolean.

Logstash comes with over 120 patterns built-in, saving you from writing complex regex from scratch. These patterns are stored in the /logstash/vendor/bundle/jruby/x.x/gems/logstash-patterns-core-x.x.x/patterns directory.

💡

If you want to test your Grok patterns before using them, this Grok Debugger guide might come in handy.

Essential Logstash Grok Examples for Common Log Formats

Parsing Apache Access Logs with Built-in Patterns

Apache logs are some of the most common logs you'll work with. Here's how to parse a standard Apache log format:

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
}

One line of code handles all this parsing! The COMBINEDAPACHELOG pattern extracts these fields:

clientip: The IP address of the client
ident: The identity information provided by the client (usually "-")
auth: The user authentication information
timestamp: When the request was received
verb: The HTTP method (GET, POST, etc.)
request: The requested resource path
httpversion: HTTP version
response: HTTP response code
bytes: Size of the response in bytes
referrer: The referring URL
agent: The user agent string

Creating Custom Patterns for Web Server Logs

For custom formats, you can build your own pattern:

filter {
  grok {
    match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes:int} %{NUMBER:duration:float}" }
  }
}

This would match logs like:

55.3.244.1 GET /index.html 15824 0.043

And extract these fields:

client: The IP address
method: The HTTP method used
request: The requested path
bytes: Size in bytes (converted to integer)
duration: Request processing time (converted to float)

💡

If you're figuring out where Grok fits in the bigger logging setup, this guide to log shippers lays it out clearly.

Comprehensive Reference of Grok Patterns for DevOps Workflows

Below are some patterns you'll use constantly in your DevOps work:

Pattern	Description	Example Match	Common Usage
IP	IPv4 address	192.168.1.1	Client IPs, server addresses
HOSTNAME	Host name	server-01.example.com	Server identification
TIMESTAMP_ISO8601	ISO8601 timestamp	2023-04-10T13:25:00.123Z	Modern application logs
HTTPDATE	HTTP date format	01/Jan/2023:13:25:15 +0100	Web server logs
NUMBER	Any number	12345	Response times, status codes
INT	Integer	12345	Counts, durations
WORD	A word (letters, numbers, underscore)	server_01	Service names, log levels
GREEDYDATA	Everything until the end of line	any text here...	Message content
DATA	Non-greedy capture	some text	Specific field extraction
QUOTEDSTRING	String inside quotes	"example text"	JSON values, parameters
LOGLEVEL	Log levels	INFO, ERROR, DEBUG	Application log severity
UUID	Universal unique identifier	5c2c2698-c2c8-4c3e-aab6-74c046cb719f	Request IDs, trace IDs

Effective Techniques for Debugging Grok Pattern Matches

When your Grok pattern isn't matching as expected (and this happens to everyone), try these steps:

Use the Grok Debugger: Test your patterns with the Grok Debuggerthe or Kibana's built-in Grok Debugger in the Dev Tools section.
Try multiple patterns with debugging enabled:

filter {
  grok {
    match => { "message" => "%{PATTERN1}" }
    tag_on_failure => ["pattern1_failed"]
    add_field => { "matched_by" => "pattern1" }
    break_on_match => false
  }
  
  grok {
    match => { "message" => "%{PATTERN2}" }
    tag_on_failure => ["pattern2_failed"]
    add_field => { "matched_by" => "pattern2" }
    break_on_match => false
  }
}

Check for grok failures: Use the _grokparsefailure tag to identify logs that didn't match.

filter {
  grok {
    match => { "message" => "%{PATTERN1}" }
    tag_on_failure => ["_grokparsefailure"]
  }
}

Then search for that tag in Kibana to find problematic logs:

tags:_grokparsefailure

Use grok pattern dump: Output all available patterns for reference:

bin/logstash -e 'filter { grok { match => { "message" => "" } } }' --config.reload.automatic

💡

If your logs come from Java apps, this Log4j vs Log4j2 comparison can help you spot what’s worth tweaking.

Strategies for Processing Multi-line Log Formats

Many applications produce multi-line logs, like Java stack traces. Here's how to handle them:

filter {
  multiline {
    pattern => "^%{TIMESTAMP_ISO8601}"
    negate => true
    what => "previous"
  }
  
  grok {
    match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:msg}" }
  }
}

This configuration:

Combines related log lines that don't start with a timestamp
Joins them with the previous line that had a timestamp
Then extracts fields from the combined message

Java Exception Stack Trace Example

For Java stack traces, you might use:

filter {
  multiline {
    pattern => "^[\\t ]"
    what => "previous"
  }
  
  multiline {
    pattern => "^[a-zA-Z#]"
    negate => true
    what => "previous"
  }
  
  grok {
    match => { 
      "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} \[%{DATA:thread}\] %{JAVACLASS:class}: %{GREEDYDATA:message}"
    }
  }
}

This handles indentation and continuation lines in Java exceptions.

Advanced Grok Pattern Techniques for Complex Logs

Creating and Using Custom Pattern Definitions

You can define your own patterns for reuse:

filter {
  grok {
    pattern_definitions => {
      "APPID" => "[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12}"
      "CUSTOM_TIMESTAMP" => "%{YEAR}-%{MONTHNUM}-%{MONTHDAY}T%{HOUR}:%{MINUTE}:%{SECOND},%{INT}"
    }
    match => { 
      "message" => "%{CUSTOM_TIMESTAMP:timestamp} %{APPID:application_id} %{GREEDYDATA:message}" 
    }
  }
}

Implementing Conditional Pattern Matching for Different Log Types

Different log types? No problem:

filter {
  if [source] == "api-server" {
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
  } else if [source] == "database" {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:msg}" }
    }
  } else if [source] =~ /^app-\d+$/ {
    grok {
      match => { "message" => "%{DATA:service}\[%{NUMBER:pid}\]: \[%{WORD:loglevel}\] %{GREEDYDATA:msg}" }
    }
  }
}

Using Oniguruma Regular Expressions for Complex Pattern Matching

For more complex matching, you can use inline regex patterns:

filter {
  grok {
    match => { 
      "message" => "(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}) (?<component>%{WORD}):%{SPACE}(?<level>%{LOGLEVEL}): (?:%{SPACE}\[(?<thread>[^\]]+)\]:)? (?<message>.*)"
    }
  }
}

💡

If you're wondering how much detail is too much in your logs, this trace-level logging explainer is worth a read.

Grok Examples for Popular DevOps Tools

Kubernetes Container Log Pattern Extraction

filter {
  grok {
    match => { 
      "message" => "%{TIMESTAMP_ISO8601:timestamp} %{WORD:level} \[%{WORD:component}\] \[%{WORD:namespace}/%{WORD:pod}/%{WORD:container}\] %{GREEDYDATA:msg}" 
    }
  }
  
  date {
    match => [ "timestamp", "ISO8601" ]
    target => "@timestamp"
  }
  
  mutate {
    add_field => {
      "kubernetes.namespace" => "%{namespace}"
      "kubernetes.pod" => "%{pod}"
      "kubernetes.container" => "%{container}"
    }
    remove_field => [ "namespace", "pod", "container" ]
  }
}

HAProxy Load Balancer Log Processing Pattern

filter {
  grok {
    match => { 
      "message" => "%{IP:client_ip}:%{NUMBER:client_port} \[%{HTTPDATE:timestamp}\] %{WORD:frontend_name} %{WORD:backend_name}/%{WORD:server_name} %{NUMBER:time_request:float}/%{NUMBER:time_queue:float}/%{NUMBER:time_backend_connect:float}/%{NUMBER:time_backend_response:float}/%{NUMBER:time_duration:float} %{NUMBER:http_status_code:int} %{NUMBER:bytes_read:int} %{DATA:captured_request_cookie} %{DATA:captured_response_cookie} %{WORD:termination_state} %{NUMBER:actconn:int}/%{NUMBER:feconn:int}/%{NUMBER:beconn:int}/%{NUMBER:srvconn:int}/%{NUMBER:retries:int} %{NUMBER:srv_queue:int}/%{NUMBER:backend_queue:int} \{%{DATA:request_headers}\} \{%{DATA:response_headers}\} \"%{WORD:http_verb} %{NOTSPACE:http_request} HTTP/%{NUMBER:http_version}\""
    }
  }
  
  date {
    match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    target => "@timestamp"
  }
  
  mutate {
    convert => {
      "time_request" => "float"
      "time_queue" => "float"
      "time_backend_connect" => "float"
      "time_backend_response" => "float"
      "time_duration" => "float"
      "http_status_code" => "integer"
      "bytes_read" => "integer"
    }
  }
}

Jenkins Build Log Pattern Extraction

filter {
  grok {
    match => { 
      "message" => "\[%{TIMESTAMP_ISO8601:timestamp}\] %{WORD:level}: %{GREEDYDATA:msg}"
    }
  }
  
  if [msg] =~ "^Started by " {
    grok {
      match => { "msg" => "^Started by (?<build_trigger>.*)" }
      tag_on_failure => []
    }
  } else if [msg] =~ "^Building " {
    grok {
      match => { "msg" => "^Building (?<build_status>.*)" }
      tag_on_failure => []
    }
  } else if [msg] =~ "^Finished: " {
    grok {
      match => { "msg" => "^Finished: (?<build_result>.*)" }
      tag_on_failure => []
    }
  }
}

Nginx Access Log Pattern Matching

filter {
  grok {
    match => { 
      "message" => '%{IPORHOST:remote_ip} - %{DATA:user_name} \[%{HTTPDATE:time}\] "%{WORD:method} %{DATA:url} HTTP/%{NUMBER:http_version}" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} "%{DATA:referrer}" "%{DATA:agent}" "%{DATA:forwarded_for}" %{NUMBER:request_length} %{NUMBER:request_time} \[%{DATA:proxy_upstream_name}\] \[%{DATA:upstream_addr}\] %{NUMBER:upstream_response_length} %{NUMBER:upstream_response_time} %{NUMBER:upstream_status} %{DATA:req_id}'
    }
  }
}

💡

If you're staring at messy log files wondering what’s useful, this log file analysis guide breaks it down nicely.

Performance Optimization Strategies for Grok Pattern Matching

Grok is powerful but can be CPU-intensive. Keep these tips in mind:

Use specific patterns: The more specific your pattern, the faster it matches. Don't use .* or %{GREEDYDATA} when you can use a more specific pattern.
Limit named captures: Each named capture creates a field and consumes memory. Only capture what you need.
Order patterns by frequency: List the most common patterns first for better performance.

filter {
  grok {
    match => { 
      "message" => [
        "%{PATTERN1}", # Matches 80% of logs
        "%{PATTERN2}", # Matches 15% of logs
        "%{PATTERN3}"  # Matches 5% of logs
      ]
    }
    break_on_match => true
  }
}

Use break_on_match wisely: Set it to true for mutually exclusive patterns and false when you want to apply multiple patterns.
Avoid backtracking: Complex regex with lots of optional parts can cause backtracking, which hurts performance.
Use anchors: Start patterns with ^ when possible to anchor to the start of the line.
Pre-filter large datasets: Use simple patterns first to filter the dataset.

filter {
  if [message] =~ "ERROR" {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} ERROR %{GREEDYDATA:error_message}" }
    }
  }
}

Integrating Grok with Other Logstash Filters for Complete Log Processing

Sometimes you need more than pattern matching. Combine Grok with other filters:

filter {
  # Parse the log with Grok
  grok {
    match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:msg}" }
  }
  
  # Convert timestamp string to a proper date
  date {
    match => [ "timestamp", "ISO8601" ]
    target => "@timestamp"
  }
  
  # Convert field types
  mutate {
    convert => { 
      "duration" => "float"
      "response_size" => "integer"
      "status_code" => "integer"
    }
  }
  
  # Add geo information for IP addresses
  geoip {
    source => "clientip"
    target => "geo"
  }
  
  # Parse JSON in the message field
  if [msg] =~ /^\{.*\}$/ {
    json {
      source => "msg"
      target => "msg_json"
    }
  }
  
  # Drop sensitive information
  mutate {
    remove_field => ["password", "credit_card", "auth_token"]
  }
}

Building a Complete Logstash Pipeline with Grok Patterns

Here's how to put it all together in a complete pipeline:

input {
  file {
    path => "/var/log/application/*.log"
    start_position => "beginning"
    sincedb_path => "/var/lib/logstash/sincedb"
    type => "application"
  }
  
  beats {
    port => 5044
    type => "beats"
  }
}

filter {
  if [type] == "application" {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{WORD:service}\] %{LOGLEVEL:level}: %{GREEDYDATA:msg}" }
    }
    
    date {
      match => [ "timestamp", "ISO8601" ]
      target => "@timestamp"
    }
  } else if [type] == "beats" and [fields][log_type] == "nginx" {
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
  }
  
  if "_grokparsefailure" in [tags] {
    mutate {
      add_field => { "parsing_error" => "true" }
    }
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "%{type}-%{+YYYY.MM.dd}"
  }
  
  if "_grokparsefailure" in [tags] {
    file {
      path => "/var/log/logstash/failed_events.log"
    }
  }
}

Conclusion

Grok is an essential tool in any DevOps professional's ELK Stack toolkit. It transforms chaotic logs into structured data that you can analyze, alert on, and visualize.

The key to mastering Grok is practice – start with the examples in this guide, adapt them to your specific log formats, and gradually build a library of patterns that work for your infrastructure.

💡

What log patterns are giving you trouble? Share your challenges with our Discord Community – we're always ready to help with tricky Grok patterns!

FAQs

What's the difference between Grok and regular expressions?

Grok is built on top of regular expressions but makes them more reusable and readable. Instead of writing a complex regex like (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) for an IP address, you can simply use %{IP:client_ip}. Grok provides a library of predefined patterns that you can combine and reference by name.

How do I handle logs with variable structures?

For logs with variable structures, use multiple pattern matching with the break_on_match option set to true:

filter {
  grok {
    match => {
      "message" => [
        "%{PATTERN1}", # For log type A
        "%{PATTERN2}", # For log type B
        "%{PATTERN3}"  # For log type C
      ]
    }
    break_on_match => true
  }
}

This tries each pattern in order until one matches.

Can I use Grok patterns with JSON logs?

Yes, but it's often better to use the JSON filter for fully structured JSON logs:

filter {
  json {
    source => "message"
  }
}

Use Grok when you have mixed formats or need to extract JSON from within a larger log entry:

filter {
  grok {
    match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:json_data}" }
  }
  
  json {
    source => "json_data"
    target => "parsed_data"
  }
}

How can I handle custom date formats with Grok?

First extract the date string with Grok, then use the date filter to parse it:

filter {
  grok {
    match => { "message" => "%{CUSTOM_DATE_PATTERN:timestamp} %{GREEDYDATA:msg}" }
  }
  
  date {
    match => [ "timestamp", "yyyy/MM/dd HH:mm:ss.SSS" ]
    target => "@timestamp"
  }
}

What should I do when a log format changes?

When log formats change:

Create a new Grok pattern for the new format
Use conditional matching to apply different patterns based on log characteristics
Consider using version tags in your fields (e.g., message_v1, message_v2)

filter {
  if [message] =~ "new_format_indicator" {
    grok { match => { "message" => "%{NEW_PATTERN}" } }
  } else {
    grok { match => { "message" => "%{OLD_PATTERN}" } }
  }
}

How do I troubleshoot Grok pattern performance issues?

Replace complex patterns with simpler ones when possible
Use the Grok Debugger to test and optimize patterns before deployment

Use the --profile flag to benchmark your pipeline

bin/logstash --config.test_and_exit --path.settings=/etc/logstash -f /etc/logstash/conf.d/my_config.conf --profile

Enable Logstash slow log by adding this to logstash.yml:yaml

slowlog.threshold.warn: 2s
slowlog.threshold.info: 1s
slowlog.threshold.debug: 500ms
slowlog.threshold.trace: 100ms

Store common patterns in a centralized location:

Reference these patterns in your Logstash configuration:

filter {
  grok {
    patterns_dir => ["/etc/logstash/patterns"]
    match => { "message" => "%{APP_LOG_FORMAT}" }
  }
}

Add your patterns to files in this directory:

# /etc/logstash/patterns/custom_patterns
CUSTOM_DATE_FORMAT %{YEAR}[/-]%{MONTHNUM}[/-]%{MONTHDAY} %{TIME}
APP_LOG_FORMAT \[%{TIMESTAMP_ISO8601:timestamp}\] %{LOGLEVEL:level} %{NOTSPACE:logger} - %{GREEDYDATA:message}

Create a patterns directory:

mkdir -p /etc/logstash/patterns

Can Grok extract nested fields?

Yes, you can extract nested fields with dot notation or by combining Grok with the mutate filter:

filter {
  grok {
    match => { "message" => "%{IP:client.ip} %{WORD:client.method} %{PATH:client.request}" }
  }
}

Or:

filter {
  grok {
    match => { "message" => "%{IP:clientip} %{WORD:method} %{PATH:request}" }
  }
  
  mutate {
    add_field => {
      "[client][ip]" => "%{clientip}"
      "[client][method]" => "%{method}"
      "[client][request]" => "%{request}"
    }
    remove_field => ["clientip", "method", "request"]
  }
}

How can I validate my Grok patterns before deploying to production?

Use Kibana's Grok Debugger in Dev Tools
Set up a staging Logstash instance to validate patterns with real traffic

Test with a small sample of logs first:

input {
  file {
    path => "/path/to/sample/logs"
    start_position => "beginning"
    sincedb_path => "/dev/null"  # Don't track position for tests
  }
}

Use the --config.test_and_exit flag:

bin/logstash --config.test_and_exit -f your_config.conf

Logstash Grok Examples: A Detailed Guide to Pattern Matching

Contents

How Grok Pattern Syntax Works in Detail

Essential Logstash Grok Examples for Common Log Formats

Parsing Apache Access Logs with Built-in Patterns

Creating Custom Patterns for Web Server Logs

Comprehensive Reference of Grok Patterns for DevOps Workflows

Effective Techniques for Debugging Grok Pattern Matches

Strategies for Processing Multi-line Log Formats

Java Exception Stack Trace Example

Advanced Grok Pattern Techniques for Complex Logs

Creating and Using Custom Pattern Definitions

Implementing Conditional Pattern Matching for Different Log Types

Using Oniguruma Regular Expressions for Complex Pattern Matching

Grok Examples for Popular DevOps Tools

Kubernetes Container Log Pattern Extraction

HAProxy Load Balancer Log Processing Pattern

Jenkins Build Log Pattern Extraction

Nginx Access Log Pattern Matching

Performance Optimization Strategies for Grok Pattern Matching

Integrating Grok with Other Logstash Filters for Complete Log Processing

Building a Complete Logstash Pipeline with Grok Patterns

Conclusion

FAQs

What's the difference between Grok and regular expressions?

How do I handle logs with variable structures?

Can I use Grok patterns with JSON logs?

How can I handle custom date formats with Grok?

What should I do when a log format changes?

How do I troubleshoot Grok pattern performance issues?

Can Grok extract nested fields?

How can I validate my Grok patterns before deploying to production?

Contents

Do More with Less

Handcrafted Related Posts

How to Configure Docker’s Shared Memory Size (/dev/shm)

A Complete Guide to Linux Log File Locations and Their Usage

How Auditd Logs Help Secure Linux Environments

Logstash Grok Examples: A Detailed Guide to Pattern Matching

Contents

How Grok Pattern Syntax Works in Detail

Essential Logstash Grok Examples for Common Log Formats

Parsing Apache Access Logs with Built-in Patterns

Creating Custom Patterns for Web Server Logs

Comprehensive Reference of Grok Patterns for DevOps Workflows

Effective Techniques for Debugging Grok Pattern Matches

Strategies for Processing Multi-line Log Formats

Java Exception Stack Trace Example

Advanced Grok Pattern Techniques for Complex Logs

Creating and Using Custom Pattern Definitions

Implementing Conditional Pattern Matching for Different Log Types

Using Oniguruma Regular Expressions for Complex Pattern Matching

Grok Examples for Popular DevOps Tools

Kubernetes Container Log Pattern Extraction

HAProxy Load Balancer Log Processing Pattern

Jenkins Build Log Pattern Extraction

Nginx Access Log Pattern Matching

Performance Optimization Strategies for Grok Pattern Matching

Integrating Grok with Other Logstash Filters for Complete Log Processing

Building a Complete Logstash Pipeline with Grok Patterns

Conclusion

FAQs

What's the difference between Grok and regular expressions?

How do I handle logs with variable structures?

Can I use Grok patterns with JSON logs?

How can I handle custom date formats with Grok?

What should I do when a log format changes?

How do I troubleshoot Grok pattern performance issues?

How do I share Grok patterns across multiple Logstash instances?

Can Grok extract nested fields?

How can I validate my Grok patterns before deploying to production?

Contents

Do More with Less

Handcrafted Related Posts

How to Configure Docker’s Shared Memory Size (/dev/shm)

A Complete Guide to Linux Log File Locations and Their Usage

How Auditd Logs Help Secure Linux Environments