Kibana gives you a structured way to explore log data indexed in Elasticsearch. With the right queries and visualizations, you can identify anomalies, debug issues more quickly, and track trends across services.
This blog covers practical ways to query logs using Kibana’s Lucene and KQL syntax, build visualizations that surface meaningful signals, and set up dashboards for ongoing log-based monitoring.
Step-by-Step Process to Setup Log Analysis in Kibana
Before you can search or visualize logs in Kibana, you need to define an index pattern that matches your log data. Most setups use a time-based pattern like logs-*
or app-logs-*
, depending on how logs are shipped into Elasticsearch.
Creating the Index Pattern
In Kibana, go to Stack Management → Index Patterns. Hit Create index pattern, and enter a pattern like logs-*
or logstash-*
. Kibana will scan for matching indices and list the available fields.
If you’re setting up the index manually, here’s a basic mapping:
PUT /logs-2024.01.15
{
"mappings": {
"properties": {
"@timestamp": { "type": "date" },
"level": { "type": "keyword" },
"message": { "type": "text" },
"service": { "type": "keyword" },
"trace_id": { "type": "keyword" },
"user_id": { "type": "keyword" }
}
}
}
When prompted, select @timestamp
As your Time Filter field allows you to slice log data by time, without it, metrics such as time-range filters, charts, and log exploration won’t work properly.
Use ECS to Make Logs Easier to Work With
Consistent field naming goes a long way when you're querying or building dashboards. And at these moments, the Elastic Common Schema (ECS) helps. It defines a standard set of field names, like log.level
, service.name
, and trace.id
so tools like Kibana can parse and link data automatically.
Here’s an example of a log that follows ECS:
{
"message": "POST /api/users 200 145ms - 2.1KB",
"log": { "level": "INFO", "logger": "http.server.response" },
"trace": { "id": "9b99131a6f66587971ef085ef97dfd07" },
"transaction": { "id": "d0c5bbf14f5febca" },
"@timestamp": "2024-01-15T10:30:00.000Z",
"service": { "name": "auth-service" }
}
With ECS in place, it’s easier to search across fields, correlate logs with traces, and use prebuilt dashboards without fiddling with mappings.
Keep Indices Manageable
If your app generates a lot of logs, daily or weekly index rollovers help. Instead of writing everything into a single logs
index, use something like logs-2024.07.16
. Smaller, time-based indices make queries faster and reduce memory usage during visualizations or aggregations.
They also make retention simpler; you can just delete older indices by pattern or timestamp.
Advanced Query Patterns for Log Search
Kibana supports two query languages: KQL (Kibana Query Language) and Lucene; KQL is the default in most modern Kibana versions, and it’s cleaner and easier for most day-to-day log searches. But for more advanced filtering, Lucene gives you extra control, especially when working with regex or proximity searches.
Understanding both helps you search logs more efficiently, whether you're debugging an incident or analyzing patterns over time.
KQL vs Lucene: Syntax Comparison
KQL is simple and readable:
service:checkout AND level:ERROR
Lucene gives you more precision for complex queries, like matching specific patterns in log messages:
service:checkout AND level:ERROR AND message:/timeout.*connection/
You can switch between the two from the Kibana query bar.
Free-Text Search Patterns
Text search applies across all analyzed fields, including _source
. Here are a few patterns you'll use often:
# Case-insensitive keyword
timeout
# Wildcards
timeout*
time?ut
# Exact phrase
"connection timeout"
# Escaping characters
"connection\/timeout"
Field-Based Filters
You can target specific fields to filter logs by service, trace ID, response time, and more:
# Errors from a specific service
level:ERROR AND service:checkout
# Logs with a specific trace ID
trace_id:9b99131a6f66587971ef085ef97dfd07
# Logs within a time range
@timestamp:[2024-01-15T00:00:00 TO 2024-01-15T23:59:59] AND level:(ERROR OR WARN)
# Range query on numeric fields
response_time:[100 TO 500]
bytes:{1024 TO *}
# Check if a field exists
_exists_:user_id
These patterns help you slice large volumes of logs without overloading Kibana.
Boolean Logic and Complex Conditions
You can combine filters using AND
, OR
, and NOT
, with grouping for clarity:
# Grouped error levels by service
(level:ERROR OR level:CRITICAL) AND service:(auth OR payment)
# Exclude test users
NOT environment:test AND NOT user_id:*_test_*
# High-duration queries, excluding background jobs
duration:>5000 AND NOT job_type:background
Lucene also supports shorthand negation:
-USA
NOT Chrome
(USA AND Firefox) OR Windows
Proximity Searches (Lucene Only)
When you need to find terms that appear near each other in log messages, use Lucene proximity:
# Within 2 words of each other
"database connection"~2
# Within 5-word distance
"error timeout"~5
Useful for searching loosely structured logs or tracing error contexts.
Special Characters and Escaping
Lucene treats certain characters as operators. To match them literally, escape them:
# Reserved characters: + - && || ! ( ) { } [ ] ^ " ~ * ? : \
url:https\:\/\/api\.example\.com
message:"Error: Connection [FAILED]"
Escaping mistakes can cause queries to fail silently or return no results—double-check when something feels off.
Combine Trace IDs with Timestamps
During incident response, one effective pattern is filtering by trace ID and narrowing by time:
trace_id:abc123def456 AND @timestamp:[now-1h TO now]
This lets you track the full path of a request across services and quickly spot failures in the chain.
Use Autocomplete to Speed Up Querying
Kibana's query bar includes autocomplete for fields and values. As you type, it suggests valid field names and previously seen values, making it faster to construct accurate queries, especially when you're not sure about field naming or formats.
Build Effective Log Visualizations in Kibana
Kibana makes it easier to spot trends and anomalies in your logs, especially the kind that are hard to catch by skimming raw text. The key is knowing which visualizations to use, how to structure them, and when they help.
Most useful log visualizations are time-based. They highlight error spikes, dips in activity, or service-level issues that may not show up in a single log line but become obvious when you look at aggregate patterns.
Time Series: Track Logs Over Time
A basic time-series chart is one of the most effective ways to monitor log flow. It shows how log volume changes over time, broken down by severity:
- X-axis:
@timestamp
(use auto interval) - Y-axis: Count of documents
- Split series:
level.keyword
This setup helps you answer questions like:
- When did the error rate start climbing?
- Did a specific service go quiet unexpectedly?
- Are INFO logs flooding the system and hiding more important events?
It’s a quick way to catch problems early, before they snowball.
Service Health: Break Down Errors by Service
To isolate which services are misbehaving, a bar chart grouped by service.keyword
can be much more useful than a flat log view. Here’s an example Elasticsearch query to back a service-level error chart:
{
"query": {
"bool": {
"must": [
{ "range": { "@timestamp": { "gte": "now-1h" } } },
{ "term": { "level": "ERROR" } }
]
}
},
"aggs": {
"services": {
"terms": { "field": "service.keyword", "size": 20 }
}
}
}
This visualization quickly answers:
- Which services are throwing the most errors right now?
- Are the same services failing repeatedly?
- Is the issue widespread or isolated?
Pair this with alert thresholds or a historical baseline, and you’ve got a solid early-warning system.
Heat Maps: Spot Hidden Patterns
When logs get noisy, heat maps can help you see beyond the noise. They’re especially good for catching recurring failures—things like nightly job crashes, traffic spikes, or rate-limited API calls.
For a heat map showing error concentration over time:
- X-axis:
@timestamp
(hourly or daily buckets) - Y-axis:
service.keyword
- Color scale: Count of logs where
level: ERROR
This helps answer:
- Are failures clustered during specific hours or days?
- Are certain services consistently failing under load?
- Did something break after the last deployment?
It’s great for postmortems and recurring issue detection.
Use Filters to Refine Views
Not every visualization needs a complex query. Kibana’s built-in filter panel lets you narrow results without writing any syntax.
To filter logs for a specific service:
- Click Add filter
- Set Field to
service.keyword
- Set Operator to
is
- Set Value to
auth-service
You can also pin filters across dashboards, disable them temporarily, or invert them to exclude data. This makes it easy to toggle between views during incident triage or dashboard reviews.
Save Searches to Reuse and Share
If you’re running the same queries often, like “errors from payment-service in the last hour,” save them as a Saved Search from the Discover tab.
- Run your query in Discover
- Click Save, give it a name like
Payment Errors - Last 1h
- Reuse that saved search across dashboards or share the link with your team
This keeps everyone aligned and saves time when creating visualizations or investigating recurring issues.
Troubleshoot Using Kibana Server Logs
Kibana doesn’t just visualize logs; it also generates its own. These internal logs are often overlooked but can be incredibly useful when you're debugging slow dashboards, failing plugins, or performance bottlenecks.
With a proper logging setup, you can trace what's happening behind the scenes, down to the individual request or Elasticsearch query.
Configure Kibana Logging for Targeted Debugging
Instead of enabling all logs (which gets noisy fast), you can configure Kibana to log specific areas, like HTTP responses, Elasticsearch queries, or plugin behavior.
Here's an example config in kibana.yml
:
logging:
appenders:
file:
type: file
fileName: ./kibana.log
layout:
type: json
logging.loggers:
# HTTP responses from Kibana server
- name: http.server.response
level: debug
appenders: [file]
# Elasticsearch queries issued by Kibana
- name: elasticsearch.query
level: debug
appenders: [file]
# Custom plugin logs
- name: plugins.myPlugin
level: debug
appenders: [file]
This setup writes structured logs to kibana.log
, with separate loggers for different components. Keeping them scoped like this makes the logs easier to query and less overwhelming during incident analysis.
Investigate Slow Kibana Responses
If Kibana feels sluggish, like dashboards taking too long to load or requests timing out, start with http.server.response
logs. These show how long each request took and what endpoint was involved.
Example log:
{
"message": "POST /internal/telemetry/clusters/_stats 200 1014ms - 43.2KB",
"log": { "level": "DEBUG", "logger": "http.server.response" },
"trace": { "id": "9b99131a6f66587971ef085ef97dfd07" },
"transaction": { "id": "d0c5bbf14f5febca" }
}
Use the trace.id
to group all log entries for that request:
trace.id:"9b99131a6f66587971ef085ef97dfd07"
This lets you follow the full flow of a single request—what Kibana did, which Elasticsearch queries ran, and how long each step took.
Common Log Search Patterns
Here are a few useful queries to quickly zero in on common performance or startup issues:
# Find slow HTTP responses
log.logger:"http.server.response" AND message:*ms* AND message:>1000ms
# Identify slow Elasticsearch queries
log.logger:"elasticsearch.query" AND message:*took*
# Check plugin initialization
log.logger:"plugins.*" AND message:*initialized*
You can also combine these with time filters or saved searches to monitor Kibana behavior over time, for example, tracking response spikes after new deployments or plugin changes.
Monitor and Alert on Logs in Kibana
Kibana doesn’t just help you explore logs; it can actively monitor them. With features like Watcher (via Elastic Stack) or integrations with external alerting systems, you can catch problems early and trigger alerts based on patterns in your logs.
From error spikes to silent service failures, these signals often show up in logs before anything else.
Alert on Error Rate Spikes
A sudden increase in ERROR
logs are usually the first sign that something’s broken, whether it’s a deployment gone wrong or a service failing under load.
Here’s a basic Watcher config that checks for more than 10 errors in a 5-minute window, evaluated every minute:
{
"trigger": {
"schedule": { "interval": "1m" }
},
"input": {
"search": {
"request": {
"indices": ["logs-*"],
"body": {
"query": {
"bool": {
"must": [
{ "range": { "@timestamp": { "gte": "now-5m" } } },
{ "term": { "level": "ERROR" } }
]
}
}
}
}
}
},
"condition": {
"compare": { "ctx.payload.hits.total": { "gt": 10 } }
}
}
Triggering alerts on log volume alone is risky, but paired with level filtering, it becomes far more reliable.
Detect When a Service Goes Quiet
Sometimes, no logs are a problem, especially for services that should be producing regular output. To catch this, set up a Watch that fires when a service logs fewer than X events over a given time range:
{
"input": {
"search": {
"request": {
"indices": ["logs-*"],
"body": {
"query": {
"bool": {
"must": [
{ "range": { "@timestamp": { "gte": "now-5m" } } },
{ "term": { "service": "critical-service" } }
]
}
}
}
}
}
},
"condition": {
"compare": { "ctx.payload.hits.total": { "lt": 5 } }
}
}
Low log volume from a core service could mean it's offline, stuck, or silently failing, especially if error rates elsewhere are climbing.
Combine Log Metrics for Better Signal
On their own, error counts and log volume changes only tell part of the story. Combined, they give you a better picture of service health:
- High errors + low log volume → likely service degradation
- Steady volume + increasing warnings → pre-failure symptoms
- No logs + no alerts → visibility gap you’ll regret later
Teams often build dashboards or alerts that track both dimensions together to reduce false positives and surface real issues faster.
Link Logs with Metrics and Traces
Kibana logs become even more valuable when tied into broader observability pipelines. Last9 helps you correlate logs with metrics and traces, without fighting high-cardinality limits or cost overruns.
For example, with trace context in your logs, you can run queries like:
# Find logs for a specific trace
trace_id:abc123 AND span_id:*
# Correlate error logs with specific spans
level:ERROR AND trace_id:* AND service:payment
This lets you follow a request across services and see where exactly it failed, whether it was a DB timeout, a bad deploy, or a third-party issue.
Aggregate Logs for Better Performance
High-volume logs can be expensive and slow to query in real time. Instead of querying raw logs on dashboards, consider rolling up data into time-based aggregates:
- Hourly error counts by service and error type
- Daily traffic patterns by endpoint
- Weekly trends to inform capacity planning
Aggregations reduce query load and make it easier to spot patterns across large time ranges, without losing visibility into operational signals.
Optimize Log Query Performance in Kibana
As log volume grows, Kibana dashboards can slow down, especially when queries hit large datasets or use inefficient filters. To keep things responsive, you need to optimize how logs are stored, queried, and visualized.
Here are some proven strategies to improve performance without losing observability.
Use Index Lifecycle Management (ILM)
Index Lifecycle Management (ILM) helps manage log data across its retention period, automatically rolling over, archiving, or deleting old indices based on usage.
Here’s a sample ILM policy that balances performance with storage cost:
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "10GB",
"max_age": "1d"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"allocate": { "number_of_replicas": 0 }
}
},
"cold": {
"min_age": "30d"
},
"delete": {
"min_age": "90d"
}
}
}
}
With rollover in place, Kibana queries stay fast by targeting smaller, newer indices. Older data moves into warm or cold tiers, keeping storage costs low without affecting daily workflows.
Optimize Query Structure
The way you write queries affects both speed and resource usage. Aim to use keyword fields and time filters wherever possible:
# Good: Uses indexed keyword fields
service:auth AND level:ERROR
# Slower: Full-text match on an analyzed text field
message:"authentication failed"
# Best: Combines keyword match with time range
service:auth AND level:ERROR AND @timestamp:[now-1h TO now]
Kibana queries run faster when Elasticsearch can use filters instead of scoring relevance. Always include a bounded time range in dashboards and saved searches, open-ended queries (@timestamp: *
) can crush performance at scale.
Log Analysis Best Practices
To make log analysis more effective and less reactive, structure your workflow around these habits:
- Start broad, then zoom in: Use service-level charts to get an overview, then filter down by severity, trace ID, or user.
- Standardize field names: Use consistent fields like
service
,level
,trace_id
, anduser_id
across all apps. This makes cross-service filtering and dashboarding much easier. - Save reusable queries: Turn common searches into saved searches that teammates can share and build dashboards on top of.
- Create focused dashboards: Each service should have its own dashboard with metrics, recent errors, and operational indicators. Don’t try to cram everything into one view.
- Layer your alerts: Use alert thresholds that escalate based on severity or frequency. For example, alert after 5 errors in 1 minute, then again at 50, trigger paging at 100+.
Teams that adopt these patterns often catch issues earlier and spend less time scrambling during incidents. Clean logs and fast queries create a compounding effect: better signal, better monitoring, and faster debugging.
Final Thoughts
Kibana is great for slicing and visualizing logs, especially when paired with structured fields and reusable queries. But at scale, you start to hit limits, slow queries, timeouts, and dashboards that struggle under load.
Here’s how we handle it at Last9:
- Trace-log correlation
Logs are tied to spans and metrics, so it’s easy to follow a request across services, without jumping between tools or searching by hand. - High-cardinality log support
We keep rich context in logs, user IDs, job names, and API paths, without worrying about bloated indices or degraded performance. - Streaming aggregations
Common views like error counts, request volumes, or service-level activity are pre-aggregated, so dashboards load quickly even during high-traffic windows.
These patterns help teams we work with stay fast and flexible, even with large, noisy log pipelines.
Book sometime with us today to know more, or if you'd like to explore at your own pace, start for free!
FAQs
Q: What's the difference between KQL and Lucene syntax in Kibana?
A: KQL is simpler and more intuitive for basic searches, with better autocompletion. Lucene offers advanced features like proximity searches and complex Boolean logic. Use KQL for most day-to-day searches, and switch to Lucene when you need more control over text analysis.
Q: How do I search for logs containing specific error messages?
A: Use KQL with quotes for exact matches: message:"Connection timeout"
.
For pattern-based searches, use Lucene with regex: message:/timeout.*error/
.
For partial matches, wildcards work well: message:*timeout*
.
Q: What's the best way to structure log indices for different environments?
A: Use separate index patterns like logs-prod-*
, logs-staging-*
, and logs-dev-*
. This makes it easier to apply different retention policies and avoids accidental cross-environment queries.
Q: How can I correlate logs with distributed traces?
A: Make sure your logs include trace_id
and span_id
fields.
Then query logs using: trace_id:abc123def456
to follow a request across services.
Q: What's the most efficient way to query high-volume log data?
A: Always include a time range like @timestamp:[now-1h TO now]
.
Use keyword fields (e.g., service
, level
) instead of text fields.
Structure your query to start with the most selective conditions first.
Q: How do I set up alerts for when services stop logging?
A: Use Kibana Watcher to monitor log volume.
Set a condition to trigger when the log count for a service drops below a threshold in a defined time window.
Q: How do I handle special characters in log searches?
A: Escape special characters using backslashes:
For example, url:https\:\/\/api\.example\.com
.
Characters that need escaping include: + - && || ! ( ) { } [ ] ^ " ~ * ? : \
.
Q: Should I use Lucene or KQL for log queries?
A: KQL is easier to use and integrates better with Kibana’s UI.
Lucene is better when you need advanced features like proximity search or custom regex patterns.
Q: How do I optimize dashboard performance with large log datasets?
A: Use time-based index patterns and apply time filters to narrow queries.
Set up Index Lifecycle Management to manage old data.
Limit the timeframe in visualizations, and use filters to reduce the data scope before querying. For high-volume views, consider using streaming or pre-aggregated metrics.