Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

Apr 10th, ‘25 / 9 min read

Logstash Grok Examples: A Detailed Guide to Pattern Matching

Learn how to use Logstash Grok with simple examples. Match and parse logs easily using patterns that are easy to understand.

Logstash Grok Examples: A Detailed Guide to Pattern Matching

Grok is one of the most useful filters in Logstash, turning unstructured log data into structured, queryable information. It works by matching patterns against your logs and extracting information to fields you can use. Consider Grok as a way to teach Logstash how to read your messy logs and organize them neatly.

For DevOps professionals, Grok is the secret weapon that makes the Elastic Stack truly powerful for log analysis. It uses a combination of named regular expressions to parse logs into something meaningful that you can search, filter, and visualize in Kibana.

How Grok Pattern Syntax Works in Detail

Grok patterns follow this syntax:

%{PATTERN:field_name}

Where:

  • PATTERN is a predefined pattern (like IP, NUMBER, or WORD)
  • field_name is what you want to name the extracted data

You can also convert the data type during extraction by adding a data type:

%{PATTERN:field_name:data_type}

Where data_type can be int, float, or boolean.

Logstash comes with over 120 patterns built-in, saving you from writing complex regex from scratch. These patterns are stored in the /logstash/vendor/bundle/jruby/x.x/gems/logstash-patterns-core-x.x.x/patterns directory.

💡
If you want to test your Grok patterns before using them, this Grok Debugger guide might come in handy.

Essential Logstash Grok Examples for Common Log Formats

Parsing Apache Access Logs with Built-in Patterns

Apache logs are some of the most common logs you'll work with. Here's how to parse a standard Apache log format:

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
}

One line of code handles all this parsing! The COMBINEDAPACHELOG pattern extracts these fields:

  • clientip: The IP address of the client
  • ident: The identity information provided by the client (usually "-")
  • auth: The user authentication information
  • timestamp: When the request was received
  • verb: The HTTP method (GET, POST, etc.)
  • request: The requested resource path
  • httpversion: HTTP version
  • response: HTTP response code
  • bytes: Size of the response in bytes
  • referrer: The referring URL
  • agent: The user agent string

Creating Custom Patterns for Web Server Logs

For custom formats, you can build your own pattern:

filter {
  grok {
    match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes:int} %{NUMBER:duration:float}" }
  }
}

This would match logs like:

55.3.244.1 GET /index.html 15824 0.043

And extract these fields:

  • client: The IP address
  • method: The HTTP method used
  • request: The requested path
  • bytes: Size in bytes (converted to integer)
  • duration: Request processing time (converted to float)
💡
If you're figuring out where Grok fits in the bigger logging setup, this guide to log shippers lays it out clearly.

Comprehensive Reference of Grok Patterns for DevOps Workflows

Below are some patterns you'll use constantly in your DevOps work:

Pattern Description Example Match Common Usage
IP IPv4 address 192.168.1.1 Client IPs, server addresses
HOSTNAME Host name server-01.example.com Server identification
TIMESTAMP_ISO8601 ISO8601 timestamp 2023-04-10T13:25:00.123Z Modern application logs
HTTPDATE HTTP date format 01/Jan/2023:13:25:15 +0100 Web server logs
NUMBER Any number 12345 Response times, status codes
INT Integer 12345 Counts, durations
WORD A word (letters, numbers, underscore) server_01 Service names, log levels
GREEDYDATA Everything until the end of line any text here... Message content
DATA Non-greedy capture some text Specific field extraction
QUOTEDSTRING String inside quotes "example text" JSON values, parameters
LOGLEVEL Log levels INFO, ERROR, DEBUG Application log severity
UUID Universal unique identifier 5c2c2698-c2c8-4c3e-aab6-74c046cb719f Request IDs, trace IDs

Effective Techniques for Debugging Grok Pattern Matches

When your Grok pattern isn't matching as expected (and this happens to everyone), try these steps:

  1. Use the Grok Debugger: Test your patterns with the Grok Debuggerthe or Kibana's built-in Grok Debugger in the Dev Tools section.
  2. Try multiple patterns with debugging enabled:
filter {
  grok {
    match => { "message" => "%{PATTERN1}" }
    tag_on_failure => ["pattern1_failed"]
    add_field => { "matched_by" => "pattern1" }
    break_on_match => false
  }
  
  grok {
    match => { "message" => "%{PATTERN2}" }
    tag_on_failure => ["pattern2_failed"]
    add_field => { "matched_by" => "pattern2" }
    break_on_match => false
  }
}
  1. Check for grok failures: Use the _grokparsefailure tag to identify logs that didn't match.
filter {
  grok {
    match => { "message" => "%{PATTERN1}" }
    tag_on_failure => ["_grokparsefailure"]
  }
}

Then search for that tag in Kibana to find problematic logs:

tags:_grokparsefailure
  1. Use grok pattern dump: Output all available patterns for reference:
bin/logstash -e 'filter { grok { match => { "message" => "" } } }' --config.reload.automatic
💡
If your logs come from Java apps, this Log4j vs Log4j2 comparison can help you spot what’s worth tweaking.

Strategies for Processing Multi-line Log Formats

Many applications produce multi-line logs, like Java stack traces. Here's how to handle them:

filter {
  multiline {
    pattern => "^%{TIMESTAMP_ISO8601}"
    negate => true
    what => "previous"
  }
  
  grok {
    match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:msg}" }
  }
}

This configuration:

  1. Combines related log lines that don't start with a timestamp
  2. Joins them with the previous line that had a timestamp
  3. Then extracts fields from the combined message

Java Exception Stack Trace Example

For Java stack traces, you might use:

filter {
  multiline {
    pattern => "^[\\t ]"
    what => "previous"
  }
  
  multiline {
    pattern => "^[a-zA-Z#]"
    negate => true
    what => "previous"
  }
  
  grok {
    match => { 
      "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} \[%{DATA:thread}\] %{JAVACLASS:class}: %{GREEDYDATA:message}"
    }
  }
}

This handles indentation and continuation lines in Java exceptions.

Advanced Grok Pattern Techniques for Complex Logs

Creating and Using Custom Pattern Definitions

You can define your own patterns for reuse:

filter {
  grok {
    pattern_definitions => {
      "APPID" => "[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12}"
      "CUSTOM_TIMESTAMP" => "%{YEAR}-%{MONTHNUM}-%{MONTHDAY}T%{HOUR}:%{MINUTE}:%{SECOND},%{INT}"
    }
    match => { 
      "message" => "%{CUSTOM_TIMESTAMP:timestamp} %{APPID:application_id} %{GREEDYDATA:message}" 
    }
  }
}

Implementing Conditional Pattern Matching for Different Log Types

Different log types? No problem:

filter {
  if [source] == "api-server" {
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
  } else if [source] == "database" {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:msg}" }
    }
  } else if [source] =~ /^app-\d+$/ {
    grok {
      match => { "message" => "%{DATA:service}\[%{NUMBER:pid}\]: \[%{WORD:loglevel}\] %{GREEDYDATA:msg}" }
    }
  }
}

Using Oniguruma Regular Expressions for Complex Pattern Matching

For more complex matching, you can use inline regex patterns:

filter {
  grok {
    match => { 
      "message" => "(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}) (?<component>%{WORD}):%{SPACE}(?<level>%{LOGLEVEL}): (?:%{SPACE}\[(?<thread>[^\]]+)\]:)? (?<message>.*)"
    }
  }
}
💡
If you're wondering how much detail is too much in your logs, this trace-level logging explainer is worth a read.

Kubernetes Container Log Pattern Extraction

filter {
  grok {
    match => { 
      "message" => "%{TIMESTAMP_ISO8601:timestamp} %{WORD:level} \[%{WORD:component}\] \[%{WORD:namespace}/%{WORD:pod}/%{WORD:container}\] %{GREEDYDATA:msg}" 
    }
  }
  
  date {
    match => [ "timestamp", "ISO8601" ]
    target => "@timestamp"
  }
  
  mutate {
    add_field => {
      "kubernetes.namespace" => "%{namespace}"
      "kubernetes.pod" => "%{pod}"
      "kubernetes.container" => "%{container}"
    }
    remove_field => [ "namespace", "pod", "container" ]
  }
}

HAProxy Load Balancer Log Processing Pattern

filter {
  grok {
    match => { 
      "message" => "%{IP:client_ip}:%{NUMBER:client_port} \[%{HTTPDATE:timestamp}\] %{WORD:frontend_name} %{WORD:backend_name}/%{WORD:server_name} %{NUMBER:time_request:float}/%{NUMBER:time_queue:float}/%{NUMBER:time_backend_connect:float}/%{NUMBER:time_backend_response:float}/%{NUMBER:time_duration:float} %{NUMBER:http_status_code:int} %{NUMBER:bytes_read:int} %{DATA:captured_request_cookie} %{DATA:captured_response_cookie} %{WORD:termination_state} %{NUMBER:actconn:int}/%{NUMBER:feconn:int}/%{NUMBER:beconn:int}/%{NUMBER:srvconn:int}/%{NUMBER:retries:int} %{NUMBER:srv_queue:int}/%{NUMBER:backend_queue:int} \{%{DATA:request_headers}\} \{%{DATA:response_headers}\} \"%{WORD:http_verb} %{NOTSPACE:http_request} HTTP/%{NUMBER:http_version}\""
    }
  }
  
  date {
    match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    target => "@timestamp"
  }
  
  mutate {
    convert => {
      "time_request" => "float"
      "time_queue" => "float"
      "time_backend_connect" => "float"
      "time_backend_response" => "float"
      "time_duration" => "float"
      "http_status_code" => "integer"
      "bytes_read" => "integer"
    }
  }
}

Jenkins Build Log Pattern Extraction

filter {
  grok {
    match => { 
      "message" => "\[%{TIMESTAMP_ISO8601:timestamp}\] %{WORD:level}: %{GREEDYDATA:msg}"
    }
  }
  
  if [msg] =~ "^Started by " {
    grok {
      match => { "msg" => "^Started by (?<build_trigger>.*)" }
      tag_on_failure => []
    }
  } else if [msg] =~ "^Building " {
    grok {
      match => { "msg" => "^Building (?<build_status>.*)" }
      tag_on_failure => []
    }
  } else if [msg] =~ "^Finished: " {
    grok {
      match => { "msg" => "^Finished: (?<build_result>.*)" }
      tag_on_failure => []
    }
  }
}

Nginx Access Log Pattern Matching

filter {
  grok {
    match => { 
      "message" => '%{IPORHOST:remote_ip} - %{DATA:user_name} \[%{HTTPDATE:time}\] "%{WORD:method} %{DATA:url} HTTP/%{NUMBER:http_version}" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} "%{DATA:referrer}" "%{DATA:agent}" "%{DATA:forwarded_for}" %{NUMBER:request_length} %{NUMBER:request_time} \[%{DATA:proxy_upstream_name}\] \[%{DATA:upstream_addr}\] %{NUMBER:upstream_response_length} %{NUMBER:upstream_response_time} %{NUMBER:upstream_status} %{DATA:req_id}'
    }
  }
}
💡
If you're staring at messy log files wondering what’s useful, this log file analysis guide breaks it down nicely.

Performance Optimization Strategies for Grok Pattern Matching

Grok is powerful but can be CPU-intensive. Keep these tips in mind:

  1. Use specific patterns: The more specific your pattern, the faster it matches. Don't use .* or %{GREEDYDATA} when you can use a more specific pattern.
  2. Limit named captures: Each named capture creates a field and consumes memory. Only capture what you need.
  3. Order patterns by frequency: List the most common patterns first for better performance.
filter {
  grok {
    match => { 
      "message" => [
        "%{PATTERN1}", # Matches 80% of logs
        "%{PATTERN2}", # Matches 15% of logs
        "%{PATTERN3}"  # Matches 5% of logs
      ]
    }
    break_on_match => true
  }
}
  1. Use break_on_match wisely: Set it to true for mutually exclusive patterns and false when you want to apply multiple patterns.
  2. Avoid backtracking: Complex regex with lots of optional parts can cause backtracking, which hurts performance.
  3. Use anchors: Start patterns with ^ when possible to anchor to the start of the line.
  4. Pre-filter large datasets: Use simple patterns first to filter the dataset.
filter {
  if [message] =~ "ERROR" {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} ERROR %{GREEDYDATA:error_message}" }
    }
  }
}

Integrating Grok with Other Logstash Filters for Complete Log Processing

Sometimes you need more than pattern matching. Combine Grok with other filters:

filter {
  # Parse the log with Grok
  grok {
    match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:msg}" }
  }
  
  # Convert timestamp string to a proper date
  date {
    match => [ "timestamp", "ISO8601" ]
    target => "@timestamp"
  }
  
  # Convert field types
  mutate {
    convert => { 
      "duration" => "float"
      "response_size" => "integer"
      "status_code" => "integer"
    }
  }
  
  # Add geo information for IP addresses
  geoip {
    source => "clientip"
    target => "geo"
  }
  
  # Parse JSON in the message field
  if [msg] =~ /^\{.*\}$/ {
    json {
      source => "msg"
      target => "msg_json"
    }
  }
  
  # Drop sensitive information
  mutate {
    remove_field => ["password", "credit_card", "auth_token"]
  }
}

Building a Complete Logstash Pipeline with Grok Patterns

Here's how to put it all together in a complete pipeline:

input {
  file {
    path => "/var/log/application/*.log"
    start_position => "beginning"
    sincedb_path => "/var/lib/logstash/sincedb"
    type => "application"
  }
  
  beats {
    port => 5044
    type => "beats"
  }
}

filter {
  if [type] == "application" {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{WORD:service}\] %{LOGLEVEL:level}: %{GREEDYDATA:msg}" }
    }
    
    date {
      match => [ "timestamp", "ISO8601" ]
      target => "@timestamp"
    }
  } else if [type] == "beats" and [fields][log_type] == "nginx" {
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
  }
  
  if "_grokparsefailure" in [tags] {
    mutate {
      add_field => { "parsing_error" => "true" }
    }
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "%{type}-%{+YYYY.MM.dd}"
  }
  
  if "_grokparsefailure" in [tags] {
    file {
      path => "/var/log/logstash/failed_events.log"
    }
  }
}

Conclusion

Grok is an essential tool in any DevOps professional's ELK Stack toolkit. It transforms chaotic logs into structured data that you can analyze, alert on, and visualize.

The key to mastering Grok is practice – start with the examples in this guide, adapt them to your specific log formats, and gradually build a library of patterns that work for your infrastructure.

💡
What log patterns are giving you trouble? Share your challenges with our Discord Community – we're always ready to help with tricky Grok patterns!

FAQs

What's the difference between Grok and regular expressions?

Grok is built on top of regular expressions but makes them more reusable and readable. Instead of writing a complex regex like (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) for an IP address, you can simply use %{IP:client_ip}. Grok provides a library of predefined patterns that you can combine and reference by name.

How do I handle logs with variable structures?

For logs with variable structures, use multiple pattern matching with the break_on_match option set to true:

filter {
  grok {
    match => {
      "message" => [
        "%{PATTERN1}", # For log type A
        "%{PATTERN2}", # For log type B
        "%{PATTERN3}"  # For log type C
      ]
    }
    break_on_match => true
  }
}

This tries each pattern in order until one matches.

Can I use Grok patterns with JSON logs?

Yes, but it's often better to use the JSON filter for fully structured JSON logs:

filter {
  json {
    source => "message"
  }
}

Use Grok when you have mixed formats or need to extract JSON from within a larger log entry:

filter {
  grok {
    match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:json_data}" }
  }
  
  json {
    source => "json_data"
    target => "parsed_data"
  }
}

How can I handle custom date formats with Grok?

First extract the date string with Grok, then use the date filter to parse it:

filter {
  grok {
    match => { "message" => "%{CUSTOM_DATE_PATTERN:timestamp} %{GREEDYDATA:msg}" }
  }
  
  date {
    match => [ "timestamp", "yyyy/MM/dd HH:mm:ss.SSS" ]
    target => "@timestamp"
  }
}

What should I do when a log format changes?

When log formats change:

  1. Create a new Grok pattern for the new format
  2. Use conditional matching to apply different patterns based on log characteristics
  3. Consider using version tags in your fields (e.g., message_v1, message_v2)
filter {
  if [message] =~ "new_format_indicator" {
    grok { match => { "message" => "%{NEW_PATTERN}" } }
  } else {
    grok { match => { "message" => "%{OLD_PATTERN}" } }
  }
}

How do I troubleshoot Grok pattern performance issues?

  1. Replace complex patterns with simpler ones when possible
  2. Use the Grok Debugger to test and optimize patterns before deployment

Use the --profile flag to benchmark your pipeline

bin/logstash --config.test_and_exit --path.settings=/etc/logstash -f /etc/logstash/conf.d/my_config.conf --profile

Enable Logstash slow log by adding this to logstash.yml:yaml

slowlog.threshold.warn: 2s
slowlog.threshold.info: 1s
slowlog.threshold.debug: 500ms
slowlog.threshold.trace: 100ms

How do I share Grok patterns across multiple Logstash instances?

Store common patterns in a centralized location:

Reference these patterns in your Logstash configuration:

filter {
  grok {
    patterns_dir => ["/etc/logstash/patterns"]
    match => { "message" => "%{APP_LOG_FORMAT}" }
  }
}

Add your patterns to files in this directory:

# /etc/logstash/patterns/custom_patterns
CUSTOM_DATE_FORMAT %{YEAR}[/-]%{MONTHNUM}[/-]%{MONTHDAY} %{TIME}
APP_LOG_FORMAT \[%{TIMESTAMP_ISO8601:timestamp}\] %{LOGLEVEL:level} %{NOTSPACE:logger} - %{GREEDYDATA:message}

Create a patterns directory:

mkdir -p /etc/logstash/patterns

Can Grok extract nested fields?

Yes, you can extract nested fields with dot notation or by combining Grok with the mutate filter:

filter {
  grok {
    match => { "message" => "%{IP:client.ip} %{WORD:client.method} %{PATH:client.request}" }
  }
}

Or:

filter {
  grok {
    match => { "message" => "%{IP:clientip} %{WORD:method} %{PATH:request}" }
  }
  
  mutate {
    add_field => {
      "[client][ip]" => "%{clientip}"
      "[client][method]" => "%{method}"
      "[client][request]" => "%{request}"
    }
    remove_field => ["clientip", "method", "request"]
  }
}

How can I validate my Grok patterns before deploying to production?

  1. Use Kibana's Grok Debugger in Dev Tools
  2. Set up a staging Logstash instance to validate patterns with real traffic

Test with a small sample of logs first:

input {
  file {
    path => "/path/to/sample/logs"
    start_position => "beginning"
    sincedb_path => "/dev/null"  # Don't track position for tests
  }
}

Use the --config.test_and_exit flag:

bin/logstash --config.test_and_exit -f your_config.conf

Contents


Newsletter

Stay updated on the latest from Last9.

Authors
Anjali Udasi

Anjali Udasi

Helping to make the tech a little less intimidating. I love breaking down complex concepts into easy-to-understand terms.