Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

Build Log Automation with Last9's Query API

Here's how you can build automated log analysis workflows with Last9's Query Logs API

Jul 16th, ‘25
Build Log Automation with Last9's Query API
See How Last9 Works

Unified observability for all your telemetry. Open standards. Simple pricing.

Talk to us

Manual log investigation is one of those engineering tasks that quietly drains hours without offering much real value.

You're debugging an incident. Monitoring shows elevated error rates. Now begins the familiar drill:

  • Open the logging UI
  • Navigate to the right timeframe
  • Filter by service
  • Search for error patterns
  • Export results
  • Repeat for every affected service

It’s a tedious cycle, and it doesn’t scale.

The whole process breaks down when you’re trying to automate incident response, run continuous security monitoring, or generate compliance reports. Runbooks can’t click through UIs, and manual checks won’t cut it when your infrastructure keeps growing.

And while most log APIs promise to help, they come with their own set of headaches:

  • Proprietary query languages that take weeks to learn
  • Rate limits that choke any automation efforts
  • Incomplete responses that still force you back into the UI

LogQL That Works

Last9’s Query Logs API is LogQL and Loki compatible. If you’ve used Grafana or Loki, the syntax will feel familiar, no new language to learn, no vendor-specific DSL.

Example: fetch recent error logs from your API Gateway.

curl -X GET \
  'https://otlp.last9.io/loki/logs/api/v2/query_range?query=%7Bservice%3D%22api-gateway%22%7D%20|%3D%20%22error%22&start=1743500000&end=1743510000&limit=50' \
  -H 'Authorization: Basic $(echo -n $username:$password | base64)'

Response:

{
  "status": "success",
  "data": {
    "result": [{
      "stream": {"service": "api-gateway", "level": "error"},
      "values": [
        ["1743505000000000000", "Connection timeout to payment service"],
        ["1743504990000000000", "Rate limit exceeded for user 12345"]
      ]
    }]
  }
}

Each log line is returned as:

  • A timestamp (in nanoseconds since epoch)
  • The raw log message
  • A stream with associated labels (service, level, etc.)

The structure is designed for machine processing. Use it in automation scripts, monitoring pipelines, or downstream systems, without needing to scrape HTML UIs or manually convert formats.

Getting Started with the Logs API

Step 1: Get Your Credentials

Head to the Last9 OpenTelemetry Integration page to generate your API credentials. You’ll use these for authenticated requests to the Logs API.

Step 2: Build Your First Automation

Use the Python examples from earlier sections to:

  • Query logs for specific services
  • Parse and categorize errors
  • Trigger alerts or build CI/CD gates

You can start by modifying the investigate_service_errors() or analyze_performance_patterns() examples.

Step 3: Query Your First Logs

Replace your-service with a real service name and set the time range in seconds since epoch:

curl -X GET \
  'https://otlp.last9.io/loki/logs/api/v2/query_range?query=%7Bservice%3D%22your-service%22%7D&start=1743500000&end=1743510000&limit=10' \
  -H 'Authorization: Basic $(echo -n $username:$password | base64)'

Step 4: Test Basic Connectivity

Use the /label/service/values endpoint to verify access and discover available services:

curl -X GET \
  'https://otlp.last9.io/loki/logs/api/v1/label/service/values?start=1743000000&end=1743600000' \
  -H 'Authorization: Basic $(echo -n $username:$password | base64)'
💡
If you're already using Grafana for dashboards, here's how to handle login and access control securely.

3 Ways to Use the Last9 Query Logs API

1. Automate Log Triage During Incidents

Instead of jumping between logging dashboards during an outage, use the API to pull recent errors across services and categorize them automatically.

import requests
from datetime import datetime, timedelta

def investigate_service_errors(service_name, minutes_back=30):
    end_time = int(datetime.now().timestamp())
    start_time = int((datetime.now() - timedelta(minutes=minutes_back)).timestamp())
    
    query = f'{{service="{service_name}"}} |= "error" OR "ERROR"'
    
    response = requests.get(
        'https://otlp.last9.io/loki/logs/api/v2/query_range',
        params={
            'query': query,
            'start': start_time,
            'end': end_time,
            'limit': 100
        },
        headers={'Authorization': f'Basic {encoded_creds}'}
    )
    
    if response.status_code == 200:
        return categorize_errors(response.json())
    return None

def categorize_errors(log_data):
    patterns = {'timeouts': 0, 'rate_limits': 0, 'db_errors': 0}
    
    for stream in log_data.get('data', {}).get('result', []):
        for _, message in stream.get('values', []):
            msg = message.lower()
            if 'timeout' in msg:
                patterns['timeouts'] += 1
            elif 'rate limit' in msg:
                patterns['rate_limits'] += 1
            elif 'database' in msg or 'sql' in msg:
                patterns['db_errors'] += 1
    
    return patterns

# Example usage
error_breakdown = investigate_service_errors('payment-service')
if error_breakdown['timeouts'] > 10:
    print("High timeout rate detected - check downstream services")

Use this as part of your incident response pipeline to spot patterns early—timeouts, rate limits, or database issues—without switching contexts.

2. Detect Authentication and Access Anomalies Continuously

Build lightweight security checks by querying logs for patterns like failed logins or unauthorized access, on a rolling schedule.

def monitor_authentication_anomalies():
    failed_logins = query_logs('{service="auth-service"} |= "login failed"', minutes_back=5)
    auth_errors = query_logs('{service="api-gateway"} |= "401" OR "403"', minutes_back=5)

    alerts = []

    if failed_logins and len(failed_logins) > 20:
        alerts.append({
            'type': 'failed_logins',
            'count': len(failed_logins),
            'severity': 'high'
        })
    
    if auth_errors and len(auth_errors) > 50:
        alerts.append({
            'type': 'auth_errors',
            'count': len(auth_errors),
            'severity': 'medium'
        })
    
    return alerts

def query_logs(query, minutes_back=5):
    end_time = int(datetime.now().timestamp())
    start_time = int((datetime.now() - timedelta(minutes=minutes_back)).timestamp())
    
    response = requests.get(
        'https://otlp.last9.io/loki/logs/api/v2/query_range',
        params={
            'query': query,
            'start': start_time,
            'end': end_time,
            'limit': 100
        },
        headers={'Authorization': f'Basic {encoded_creds}'}
    )
    
    if response.status_code == 200:
        return extract_messages(response.json())
    return []

def extract_messages(log_data):
    messages = []
    for stream in log_data.get('data', {}).get('result', []):
        for _, message in stream.get('values', []):
            messages.append(message)
    return messages

You can plug this into a cron job or serverless function to catch threats like brute-force attempts or unusual access patterns without a full-blown SIEM.

3. Analyze Performance Issues from Logs Automatically

Use logs to surface bottlenecks like slow queries, long response times, or memory warnings, especially those that don’t trigger metrics-based alerts.

def analyze_performance_patterns(service_name):
    queries = [
        f'{{service="{service_name}"}} |= "slow query"',
        f'{{service="{service_name}"}} |~ "response_time.*[5-9][0-9][0-9][0-9]"',  # >5000ms
        f'{{service="{service_name}"}} |= "memory" |= "warning"'
    ]
    
    analysis = {}
    
    for query in queries:
        results = query_logs(query, minutes_back=60)
        if results:
            analysis[query] = {
                'count': len(results),
                'samples': results[:3]
            }
    
    return analysis

# Example usage
performance_issues = analyze_performance_patterns('user-service')
for query, data in performance_issues.items():
    print(f"Found {data['count']} issues for pattern: {query}")

This is useful for catching early warning signs, especially for backend services where degraded performance may go unnoticed until it snowballs.

💡
You can pair LogQL queries with the Grafana rate() function to analyze log frequency over time.

Advanced Query Patterns with Logs API

The API supports advanced LogQL constructs, making it flexible enough to handle multi-dimensional filtering, regex-based searches, and dynamic service discovery. Here are a few examples:

1. Filter Logs by Multiple Labels

Use structured queries to scope logs down to specific services, environments, or versions.

Example: production-only errors from the payment-service.

curl -X GET \
  'https://otlp.last9.io/loki/logs/api/v2/query_range?query=%7Bservice%3D%22payment-service%22%2C%20env%3D%22production%22%7D' \
  -H 'Authorization: Basic $(echo -n $username:$password | base64)'

This targets logs that match both service="payment-service" and env="production"ideal for narrowing down incidents in large deployments.

2. Use Regex Matching in LogQL

Search logs using regex when you need flexible pattern matching.

Example: fetch timeout-related errors from the API Gateway.

curl -X GET \
  'https://otlp.last9.io/loki/logs/api/v2/query_range?query=%7Bservice%3D%22api-gateway%22%7D%20|~%20%22error.*timeout%22' \
  -H 'Authorization: Basic $(echo -n $username:$password | base64)'

This matches log lines containing any error message that includes the word, timeouteven if the phrasing or structure varies.

3. Discover Available Services Dynamically

Query labels directly from the API to discover what services are emitting logs.

Example: list all unique service label values in a given time window.

curl -X GET \
  'https://otlp.last9.io/loki/logs/api/v1/label/service/values?start=1743000000&end=1743600000' \
  -H 'Authorization: Basic $(echo -n $username:$password | base64)'

This can be used to build dashboards dynamically or validate service-to-label mappings programmatically.

Plug the Logs API Into Your Workflow

The Last9 Logs API isn’t just for ad-hoc queries. You can integrate it directly into CI pipelines, monitoring setups, and existing observability tools with minimal overhead.

1. Integrate Log Checks in CI/CD Pipelines

Use logs as a signal before or after deploys. For example, you can block deployments if recent logs show spikes in errors.

GitHub Actions Example:

- name: Check deployment logs
  run: |
    python scripts/check_deployment_logs.py \
      --service ${{ github.event.repository.name }} \
      --since "5 minutes ago"

This script can call the Query Logs API and exit with a non-zero status if critical error patterns are detected during or after deployment.

2. Trigger Alerts Based on Log Volume

Send alerts based on recent log activity—especially useful for catching issues not surfaced by metrics.

Slack Alert on Error Spike:

def send_slack_alert_if_errors_spike(service_name, threshold=10):
    errors = investigate_service_errors(service_name, minutes_back=10)
    total_errors = sum(errors.values())
    
    if total_errors > threshold:
        requests.post(SLACK_WEBHOOK, json={
            'text': f'🚨 {service_name} error spike: {total_errors} errors in last 10 minutes',
            'attachments': [{
                'color': 'danger',
                'fields': [
                    {'title': 'Timeouts', 'value': errors['timeouts'], 'short': True},
                    {'title': 'Rate Limits', 'value': errors['rate_limits'], 'short': True},
                    {'title': 'DB Errors', 'value': errors['db_errors'], 'short': True}
                ]
            }]
        })

This is especially helpful for catching intermittent issues that don’t trip Prometheus thresholds but show up clearly in logs.

3. Connect Existing Grafana Dashboards

Since Last9’s API is Loki-compatible, you can plug it into existing Grafana dashboards by simply updating the data source URL. No query changes, no new plugins. Just swap the endpoint.

Error Handling and Debugging

The Logs API is designed to provide informative error messages, but when things fail silently or unexpectedly, these are the most common issues and how to troubleshoot them effectively.

No Data Returned

The API request succeeds (HTTP 200), but the data.result field is empty.

Possible causes:

  • Time range mismatch
    The start and end parameters must correspond to periods where logs exist.
    • Check whether you're querying for a future time window
    • If you're using datetime.now().timestamp() in Python, make sure the values are in seconds, not milliseconds or nanoseconds
  • Retention policy limitations
    If your logs are older than the configured retention period, the query will return no results. Check your account’s retention settings.

Incorrect service label
Labels are case-sensitive. payment-service is different from Payment-Service. Use the label discovery endpoint to verify exact label values:

curl -X GET \
  'https://otlp.last9.io/loki/logs/api/v1/label/service/values' \
  -H 'Authorization: Basic <base64-creds>'

LogQL Syntax Errors

The API returns a 400 Bad Request, usually with a message like parse error or invalid query.

How to fix:

  • Avoid special characters unless they’re properly encoded. For example:
    • "%22
    • {%7B
    • |=%7C%3D

Ensure your query matches the LogQL syntax. For example:

{service="auth-service"} |= "error"

Tip: If you’re building queries dynamically in code, always use a URL encoding library instead of manually constructing the string.

Invalid or Malformed Time Ranges

The API may return a 400 or simply an empty result set if time values are invalid.

What to check:

  • Timestamps must be in seconds since epoch unless using nanoseconds for the values field (as in the response).
  • The start time must be less than the end time.

Some clients (especially JavaScript) return milliseconds by default. Use integer division or appropriate conversion:

int(time.time())  # returns seconds

Authentication Failures

Most authentication issues result in a 401 Unauthorized or 403 Forbidden.

Checklist:

  • Confirm that the credentials you’re using are still valid and not expired or rotated.
  • If using CI/CD, verify that secrets are loaded properly into environment variables before constructing the header.

Ensure your credentials (username:password) are Base64 encoded correctly. Use:

echo -n 'user:pass' | base64

Don’t include line breaks or extra whitespace in the encoded string.

Header format:

-H 'Authorization: Basic <base64-encoded-user:pass>'

Debugging Tips

  • Use a known-good query from earlier examples to rule out query logic.
  • Validate time inputs using tools like date -d @<timestamp> or online epoch converters.
  • Capture and inspect the full response, including headers, when scripting with Python or curl. You might be ignoring useful metadata.
💡
Use Last9 MCP to check production logs right from your IDE. See logs, metrics, and traces in real time to fix issues faster, without switching tools.

Why This Approach Works

  • Loki-compatible
    You can reuse existing LogQL queries, Grafana dashboards, and tooling without changes. No need to learn a new query language or vendor-specific syntax.
  • Standard HTTP/JSON
    The API uses standard HTTP methods and JSON responses, making it easy to integrate across languages, scripts, CI tools, and monitoring systems.
  • Structured responses
    Logs come back with consistent schemas, timestamps, labels, and messages, so you can feed them directly into automation pipelines or downstream processors without additional parsing.
  • Production-ready behavior
    The API is designed for operational use:
    • Authenticated requests via standard headers
    • Proper handling of rate limits
    • Informative error messages and support for large queries

This means you can plug it into real workflows with confidence, whether that’s automated log checks, alerting systems, or dashboard backends.

💡
Check out the full documentation or grab your credentials from the Last9 console and start querying. And, if you want to discuss your specific use case, our Discord community is always open!
Authors
Prathamesh Sonpatki

Prathamesh Sonpatki

Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

X

Contents

Do More with Less

Unlock high cardinality monitoring for your teams.