Last9

Log Format Standards: JSON, XML, and Key-Value Explained

A practical look at common log format standards, how JSON, XML, and key-value logs work, and when to use each in production systems.

Aug 6th, ‘25
Log Format Standards: JSON, XML, and Key-Value Explained
See How Last9 Works

Unified observability for all your telemetry. Open standards. Simple pricing.

Talk to an Expert

Your log format defines how your application records events. The structure you choose shapes how logs get parsed, indexed, and queried. It affects how quickly you can debug issues, build alerts, or control storage usage.

In this guide, we'll take a look at the log formats developers typically use, the essential fields to include, and what trade-offs to consider before locking down a format for your system.

What Is a Log Format?

A log format defines how your application turns events into structured records, what gets logged, how it's typed, and how it's organized. It acts as the contract between your application and every system that consumes those logs, from tail -f in a terminal to your full observability stack.

Structured formats like JSON make it easier to automate parsing, build dashboards, and route logs into tools like OpenTelemetry, Fluent Bit, or log aggregators. In contrast, plain text logs might be fine during local debugging, but they don’t scale when you need to:

  • search logs across services,
  • filter by fields like user_id or status_code,
  • or correlate log entries with traces and metrics.

Your choice of format impacts:

  • Storage efficiency – verbose formats like XML consume more disk, while key-value pairs or binary formats save space.
  • Query performance – structured fields allow for faster filtering and aggregation.
  • Tooling compatibility – observability pipelines expect consistent field names and types.
  • Developer overhead – well-structured logs reduce the need for custom parsing or regex filters.
💡
If you're figuring out how to structure your logs in a way that helps during debugging, this structured logging guide is a great place to start.

Why Structured Logs Improve Troubleshooting

Structured logs follow a consistent schema, predefined field names, data types, and value formats. This makes them machine-parsable and compatible with log processors, query engines, and metric extractors.

When logs include fields like trace_id, user_id, or status_code, you can query and filter directly by those values without relying on string parsing or brittle regex. This consistency enables cross-service correlation, alert generation, and ad hoc analysis at scale.

Example:

A structured log entry from a payment service:

{
  "timestamp": "2025-08-06T08:10:15.203Z",
  "level": "error",
  "service": "payment-api",
  "message": "Payment processing failed",
  "user_id": "92a17e",
  "order_id": "a1034f",
  "trace_id": "cd302f98c3f84124"
}

This log entry supports:

  • Trace-based search across distributed services using trace_id
  • Scoped filtering for a specific user_id or order_id
  • Error aggregation by service or status_code
  • Metric generation from log volume or error frequency

In contrast, unstructured logs make it harder to perform these operations without custom parsing logic. Format inconsistencies and missing fields often lead to partial or inaccurate results during incident response.

4 Common Log Formats and When to Use Them

Different log formats serve different needs. Some are designed for quick scanning during development. Others work better when logs need to be parsed, indexed, and analyzed at scale.

Here’s a breakdown of the formats you’ll commonly see in production systems and tooling setups:

Common Log Format (CLF)

Used heavily by web servers like Apache and Nginx, the Common Log Format is still a default for HTTP access logs.

192.168.1.100 - - [15/Jan/2024:14:30:22 +0000] "GET /api/users HTTP/1.1" 200 1234

This line gives you the basics: client IP, timestamp, request method and path, HTTP version, status code, and response size.

The downside is rigidity. There's no easy way to add custom fields like request_id, user_id, or duration_ms. That makes it harder to extract structured data or plug into modern log processing pipelines.

Best used for: Lightweight HTTP logging where you don’t need to enrich logs with additional context.

JSON Logs

JSON has become the default format for structured logs in most application environments. It’s easy to generate, machine-readable, and widely supported by log collectors, processors, and storage systems.

{
  "timestamp": "2024-01-15T14:30:22.123Z",
  "level": "ERROR",
  "service": "user-service",
  "message": "User authentication failed",
  "user_id": "usr_12345",
  "request_id": "req_abc123",
  "error_code": "INVALID_TOKEN",
  "duration_ms": 156
}

JSON logs make it simple to include structured context, like user or request identifiers, without breaking downstream parsing. They’re also flexible enough to support nested fields, which can be helpful when capturing metadata.

Best used for: Applications that need structured logging, trace correlation, or metric extraction from logs.

XML Logs

While uncommon in modern app stacks, XML is still used in some enterprise systems where schema validation and strong typing are required.

<logEntry>
  <timestamp>2024-01-15T14:30:22.123Z</timestamp>
  <level>ERROR</level>
  <service>user-service</service>
  <message>User authentication failed</message>
  <metadata>
    <userId>usr_12345</userId>
    <errorCode>INVALID_TOKEN</errorCode>
  </metadata>
</logEntry>

XML logs offer strong structure and support schema validation, but the verbosity adds overhead, both in storage and in parsing time. They're rarely used in high-throughput or latency-sensitive systems.

Best used for: Systems where strict schema enforcement is needed, or where XML is already the standard.

Plain Text Logs

This format is still common in development environments and older systems. It's human-readable and quick to output, but lacks structure.

2024-01-15 14:30:22 [ERROR] user-service: User authentication failed for usr_12345 (INVALID_TOKEN)

These logs are fine when scanning terminal output or debugging locally. But as soon as you need to search or aggregate by fields like user_id, you'll need to write custom parsers or rely on brittle regex.

Best used for: Development or systems where logs aren’t ingested into structured pipelines.

Structured vs Unstructured Logs

Structured logs follow a defined schema, same fields same structure, which makes them ideal for querying, alerting, and visualizing system behavior. You can filter by values like trace_id, group errors by service, or generate metrics directly from logs.

Unstructured logs give you flexibility in how messages are written, but that flexibility often leads to inconsistency. Parsing becomes fragile, and automation gets harder as log volume grows.

For most production systems, structured logging is a safer choice. It scales well, integrates with observability tools, and improves both operational visibility and incident response.

💡
If you're working with system logs and need a quick reference for syslog formats, this guide breaks it down with practical examples.

Anatomy of a Log Entry

A log entry is a record of what your system saw at a particular moment. The more context it carries, the easier it becomes to debug issues, trace behavior, or feed logs into monitoring pipelines.

Here are the core fields you’ll usually want to include, along with format options that make them easier to parse and query.

Timestamps

Every log should capture when something happened. Timestamps help with ordering, correlation across services, and identifying delays or spikes in activity.

{
  "timestamp": "2024-01-15T14:30:22.123Z",
  "@timestamp": "2024-01-15T14:30:22.123Z"
}

ISO 8601 format with timezone info (usually UTC) is a good default. Some tools expect specific field names like @timestamp, so it’s worth checking what your stack prefers.

IP Address and Hostname

Including network details helps identify where a log came from, which machine handled the request, and where it came from originally.

{
  "client_ip": "192.168.1.100",
  "hostname": "web-server-01.production.example.com",
  "server_ip": "10.0.1.50"
}

This becomes especially useful in distributed systems, load-balanced environments, or when multiple services share the same log backend.

Log Level

Log levels describe how important or severe an event is. This helps filter logs during analysis and route them to the right destination.

{
  "level": "ERROR",
  "severity": "high"
}

Most setups use levels like DEBUG, INFO, WARN, ERROR, and FATAL. Sticking to a consistent set across services helps when searching or setting up alerts.

Message and Metadata

The message tells you what happened. Additional fields give you the surrounding context.

{
  "message": "Payment processing failed",
  "payment_id": "pay_12345",
  "amount": 99.99,
  "currency": "USD",
  "error_code": "INSUFFICIENT_FUNDS",
  "retry_count": 2
}

Messages are easier to scan when they’re written for humans. Context fields, on the other hand, are what let you search logs, build dashboards, or extract metrics. Both have their place; it helps to keep them separate.

JSON Logs vs Key-Value Pairs

The format you choose depends on how you plan to query, forward, or store your logs. Two common options are JSON and key-value strings.

JSON

JSON works well when logs carry nested data or need to support structured fields.

{
  "timestamp": "2024-01-15T14:30:22.123Z",
  "request": {
    "id": "req_abc123",
    "method": "POST",
    "url": "/api/payments",
    "headers": {
      "user-agent": "mobile-app/1.2.3"
    }
  },
  "response": {
    "status": 400,
    "duration_ms": 245
  }
}

This structure is easy to parse with modern log shippers and observability tools. It also makes it easier to query specific fields or correlate logs with traces.

Key-Value Pairs

A flatter alternative, key-value logs are easier to scan or process with shell tools and simple log agents.

timestamp=2024-01-15T14:30:22.123Z request_id=req_abc123 method=POST url=/api/payments status=400 duration_ms=245

This format works well when you don’t need nested data but still want searchable fields. It’s especially useful in systems that ship logs as raw text streams.

Both formats are valid. The right choice depends on your tooling, your data shape, and how much structure your downstream systems can handle. What matters most is consistency; once a format is in place, sticking to it makes logs far easier to use across teams and tools.

💡
Structured logs often end up in long-term storage, this Parquet vs CSV guide breaks down which format handles that better.

The Function of Log Formats

Log formats shape how your application records, serializes, and ships event data. The format defines how log fields are extracted, how data types are encoded, and how records are passed downstream to storage, processors, or dashboards.

Formatters in Action

When you call a logging function, a formatter decides how to structure the final output. Here’s an example using Python’s logging module with a custom JSON formatter:

import json
import logging

class JSONFormatter(logging.Formatter):
    def format(self, record):
        log_entry = {
            'timestamp': self.formatTime(record),
            'level': record.levelname,
            'message': record.getMessage(),
            'module': record.module,
            'line': record.lineno
        }
        return json.dumps(log_entry)

In this case, each log record is serialized as a JSON object. The formatter extracts fields like timestamp, level, and module, and outputs a structured string that can be parsed later.

This process happens before the log is sent to a file, stdout, or any remote target.

Parsing and Schema Definitions

Once logs are written, the next step is reading them, and this is where parsing strategies vary.

JSON Parsing

JSON logs are easy to work with in most languages. They let you parse and extract fields directly without regex.

const logEntry = JSON.parse(logLine);
const timestamp = new Date(logEntry.timestamp);
const level = logEntry.level;

This structure allows for straightforward ingestion into tools like OpenTelemetry, Elasticsearch, or any log processing pipeline that expects key-value pairs.

Text Log Parsing with Regex

Unstructured text logs require pattern matching. Here’s an example of parsing a plain-text log line using Python and regular expressions:

import re

pattern = r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(\w+)\] (.+): (.+)'
match = re.match(pattern, log_line)
if match:
    timestamp, level, service, message = match.groups()

Regex parsing works, but can be fragile; any format change in the log line may break parsing. It also adds overhead at ingestion time, especially when dealing with high log volumes.

Schema Validation

If your log format is structured (like JSON or XML), you can apply a schema to validate log entries before they’re indexed or processed. This helps catch missing fields, unexpected types, or format drift early in the pipeline.

Real-Time Log Processing

Format choice has a direct impact on real-time processing speed and CPU usage.

Structured formats like JSON are faster to parse than free-form text, especially at scale. However, they’re still slower than binary formats (like Protobuf or MessagePack), which are sometimes used in high-throughput logging pipelines.

Here’s a simple example of scanning a JSON log stream in real time:

for line in log_stream:
    try:
        entry = json.loads(line)
        if entry.get('level') == 'ERROR':
            alert_system.notify(entry)
    except json.JSONDecodeError:
        # Handle malformed entries
        pass

If you’re processing logs for alerting, tracing, or streaming analytics, choosing a format with consistent field names and predictable structure reduces downstream friction.

Choose the Right Format

There’s no one-size-fits-all format. If you need structured logs that support field-level querying and parsing, JSON or key-value logs are easier to work with. For high-speed ingestion or constrained environments, binary formats might offer performance advantages. What matters is aligning format complexity with your parsing, alerting, and monitoring requirements.

💡
Logs are often your first clue when something goes wrong, or when you’re sizing a new endpoint. This access log guide walks through the format every web service should log and why it matters in real systems.

Advantages of Log Format in Logging

Log formats aren’t just about readability. The structure you choose directly shapes how logs behave across services, tools, and teams. A consistent format simplifies downstream processing, improves debugging speed, and enables better automation.

Consistency Across Services

When all services follow the same log schema, same field names, types, and conventions, it becomes much easier to correlate events across systems.

// Service A
{"timestamp": "2024-01-15T14:30:22.123Z", "service": "auth", "request_id": "req_123"}

// Service B
{"timestamp": "2024-01-15T14:30:23.456Z", "service": "payment", "request_id": "req_123"}

This level of consistency allows you to:

  • Trace requests end to end using a shared request_id
  • Build dashboards that aggregate across services
  • Define alerting rules that apply uniformly to multiple teams

Machine Readability

Structured formats - like JSON, Protobuf, or key-value enable logs to be parsed and queried automatically. You can extract metrics, run anomaly detection, or trigger alerts based on individual field values without writing custom parsers.

For example:

{ "level": "ERROR", "service": "billing", "error_code": "TIMEOUT", "retry_count": 3 }

This structure allows you to group by error_code, track retry patterns, or create metrics from log volume without needing to dig through raw strings.

Storage Efficiency

Format choice also affects how much space your logs consume. JSON is readable and flexible, but can be verbose. Binary formats like Protocol Buffers or MessagePack reduce size and improve parsing performance, especially in high-throughput environments.

Key-value formats strike a middle ground: they’re easier to scan than JSON and more compact than raw text.

Why Format Matters for Server Logs and Error Logs

Server Logs

Logs that track API or backend traffic benefit from a consistent field structure — timestamps, status codes, request/response sizes, durations, and identifiers.

{
  "timestamp": "2024-01-15T14:30:22.123Z",
  "method": "POST",
  "url": "/api/orders",
  "status": 201,
  "duration_ms": 156,
  "user_id": "usr_789",
  "request_size": 2048,
  "response_size": 512
}

With this structure, it’s easier to monitor latency, flag large payloads, and analyze usage patterns.

Error Logs

Error logs benefit from diagnostic context: stack traces, error categories, retry counts, and the operation being performed.

{
  "timestamp": "2024-01-15T14:30:22.123Z",
  "level": "ERROR",
  "message": "Database connection failed",
  "error_type": "ConnectionTimeout",
  "database_host": "db-primary.internal",
  "retry_count": 3,
  "stack_trace": "...",
  "operation": "user_lookup",
  "affected_user": "usr_789"
}

Structured error logs simplify grouping and deduplication, and make it easier to identify recurring issues.

Optimization for Debugging and Root Cause Analysis

Request Tracing

Logs that include a shared request_id allow you to reconstruct the full path of a request through multiple systems.

// API Gateway
{"timestamp": "2024-01-15T14:30:22.100Z", "service": "gateway", "request_id": "req_abc123", "message": "Request received"}

// Auth Service
{"timestamp": "2024-01-15T14:30:22.150Z", "service": "auth", "request_id": "req_abc123", "message": "User authenticated"}

// Business Logic
{"timestamp": "2024-01-15T14:30:22.200Z", "service": "orders", "request_id": "req_abc123", "message": "Order created"}

This makes it possible to follow a request across services and pinpoint where delays or failures occur.

Performance Analysis

When logs include timing data per operation, you can spot bottlenecks or slow components easily.

{
  "operation": "create_order",
  "duration_ms": 245,
  "database_time_ms": 123,
  "api_call_time_ms": 89,
  "processing_time_ms": 33
}

This allows for breakdowns by system layer and more precise alert thresholds.

Error Correlation

Structured error fields allow you to group and analyze failures by category, provider, or retry behavior.

{
  "error_code": "PAYMENT_DECLINED",
  "error_category": "business_logic",
  "retry_count": 0,
  "payment_provider": "stripe",
  "decline_reason": "insufficient_funds"
}

You can then group errors by provider, track common decline reasons, or spot upstream issues faster during an incident.

Final Thoughts

Once you’ve chosen a structured log format, the next step is making sure your observability stack can work with it.

If you're including fields like trace_id, error_code, or duration_ms, you should be able to filter by them, build alerts, or correlate across services without extra tooling. That only works if your backend supports structured log ingestion end-to-end.

Here’s how Last9 helps you get real value from structured logs:

  • Query your logs by field, not just text
    Use Last9’s visual query builder to filter logs by fields like service, level, user_id, or request_id. For deeper control, switch to LogQL — both modes work natively with structured formats.
  • Skip the parsing layer
    Since Last9 understands structured formats like JSON out of the box, you don’t need to pre-process logs or extract fields manually. Everything’s queryable as-is.
  • Follow requests across logs, metrics, and traces
    When logs include request_id or trace_id, you can move from a single log entry to its related trace or metric instantly. That’s useful during incident investigations when you’re jumping between signals.

When you’ve already put effort into formatting your logs well, Last9 gives you a way to query, visualize, and correlate them, without adding extra layers to your stack.

And if you're setting up observability or refining your log pipeline, we’d be glad to walk through your setup and see where Last9 can support it.

FAQs

Q: What is a log format?

A log format defines the structure and encoding of log entries—how your application transforms events into searchable records. It specifies field names, data types, and organization to make logs either human-readable or machine-parseable.

Q: What format is a log?

Logs can be in various formats, including JSON (most common for structured logging), plain text, XML, key-value pairs, or binary formats like Protocol Buffers. The format determines how data is organized and encoded.

Q: What is the standard format of logs?

There's no single standard, but JSON has become widely adopted for application logs due to its balance of readability and machine parsing. Web servers often use Common Log Format (CLF), while system logs typically use syslog format.

Q: What is the format of a log function?

A log function typically accepts a severity level, message, and optional structured data. For example: log.info("User login", {"user_id": "123", "ip": "192.168.1.1"}). The function then formats this data according to your chosen log format.

Q: Why shoot in LOG format?

This appears to be asking about the LOG format in a different context. If you meant logging format, structured formats like JSON help with automated analysis, debugging, and monitoring by making log data queryable and consistent across services.

Q: What is in a log file?

Log files typically contain timestamps, severity levels, source information (service/module), messages describing events, and metadata like user IDs, request IDs, or error codes. Structured logs organize this data in consistent fields.

Q: Is syslog and SIEM the same?

No, syslog is a standard format and protocol for log messages, while SIEM (Security Information and Event Management) is a system that collects, analyzes, and correlates logs from multiple sources. SIEM tools often consume syslog-formatted data.

Q: How many types of logs are there?

Common log types include application logs, system logs, security logs, access logs, error logs, audit logs, and performance logs. Each serves different purposes and may use different formats optimized for their use case.

Q: How do I parse logs after generating them?

Parsing depends on the format. JSON logs use JSON.parse(), text logs use regex patterns, and structured formats like syslog have dedicated parsers. Tools like Fluent Bit, Vector, or custom scripts can parse and transform logs before sending them to aggregation systems.

Q: Why is structured logging important for application development?

Structured logging makes logs machine-readable, enabling automated analysis, alerting, and correlation across services. It supports debugging with consistent field names and helps with performance monitoring by making metrics extractable from log data.

Q: How do I choose the right log format for my application?

Consider your use case: JSON for general application logging, binary formats for high-volume systems, plain text for development readability. Factor in storage costs, parsing performance, tooling compatibility, and whether you need human readability or automated analysis.

Q: How can I convert log files to a different format?

Use log processing tools like Fluent Bit, Vector, or Logstash to parse existing logs and output them in a new format. You can also write custom scripts that read the original format and write structured output.

Q: What are the common log formats used in web servers?

Web servers commonly use Common Log Format (CLF), Extended Log Format, JSON for API servers, and custom formats for specific needs. Modern applications often use JSON or structured text formats that include request IDs, user context, and performance metrics.

Contents

Do More with Less

Unlock unified observability and faster triaging for your team.