Manual log investigation is one of those engineering tasks that quietly drains hours without offering much real value.
You're debugging an incident. Monitoring shows elevated error rates. Now begins the familiar drill:
- Open the logging UI
- Navigate to the right timeframe
- Filter by service
- Search for error patterns
- Export results
- Repeat for every affected service
It’s a tedious cycle, and it doesn’t scale.
The whole process breaks down when you’re trying to automate incident response, run continuous security monitoring, or generate compliance reports. Runbooks can’t click through UIs, and manual checks won’t cut it when your infrastructure keeps growing.
And while most log APIs promise to help, they come with their own set of headaches:
- Proprietary query languages that take weeks to learn
- Rate limits that choke any automation efforts
- Incomplete responses that still force you back into the UI
LogQL That Works
Last9’s Query Logs API is LogQL and Loki compatible. If you’ve used Grafana or Loki, the syntax will feel familiar, no new language to learn, no vendor-specific DSL.
Example: fetch recent error logs from your API Gateway.
curl -X GET \
'https://otlp.last9.io/loki/logs/api/v2/query_range?query=%7Bservice%3D%22api-gateway%22%7D%20|%3D%20%22error%22&start=1743500000&end=1743510000&limit=50' \
-H 'Authorization: Basic $(echo -n $username:$password | base64)'
Response:
{
"status": "success",
"data": {
"result": [{
"stream": {"service": "api-gateway", "level": "error"},
"values": [
["1743505000000000000", "Connection timeout to payment service"],
["1743504990000000000", "Rate limit exceeded for user 12345"]
]
}]
}
}
Each log line is returned as:
- A timestamp (in nanoseconds since epoch)
- The raw log message
- A stream with associated labels (
service
,level
, etc.)
The structure is designed for machine processing. Use it in automation scripts, monitoring pipelines, or downstream systems, without needing to scrape HTML UIs or manually convert formats.
Getting Started with the Logs API
Step 1: Get Your Credentials
Head to the Last9 OpenTelemetry Integration page to generate your API credentials. You’ll use these for authenticated requests to the Logs API.
Step 2: Build Your First Automation
Use the Python examples from earlier sections to:
- Query logs for specific services
- Parse and categorize errors
- Trigger alerts or build CI/CD gates
You can start by modifying the investigate_service_errors()
or analyze_performance_patterns()
examples.
Step 3: Query Your First Logs
Replace your-service
with a real service name and set the time range in seconds since epoch:
curl -X GET \
'https://otlp.last9.io/loki/logs/api/v2/query_range?query=%7Bservice%3D%22your-service%22%7D&start=1743500000&end=1743510000&limit=10' \
-H 'Authorization: Basic $(echo -n $username:$password | base64)'
Step 4: Test Basic Connectivity
Use the /label/service/values
endpoint to verify access and discover available services:
curl -X GET \
'https://otlp.last9.io/loki/logs/api/v1/label/service/values?start=1743000000&end=1743600000' \
-H 'Authorization: Basic $(echo -n $username:$password | base64)'
3 Ways to Use the Last9 Query Logs API
1. Automate Log Triage During Incidents
Instead of jumping between logging dashboards during an outage, use the API to pull recent errors across services and categorize them automatically.
import requests
from datetime import datetime, timedelta
def investigate_service_errors(service_name, minutes_back=30):
end_time = int(datetime.now().timestamp())
start_time = int((datetime.now() - timedelta(minutes=minutes_back)).timestamp())
query = f'{{service="{service_name}"}} |= "error" OR "ERROR"'
response = requests.get(
'https://otlp.last9.io/loki/logs/api/v2/query_range',
params={
'query': query,
'start': start_time,
'end': end_time,
'limit': 100
},
headers={'Authorization': f'Basic {encoded_creds}'}
)
if response.status_code == 200:
return categorize_errors(response.json())
return None
def categorize_errors(log_data):
patterns = {'timeouts': 0, 'rate_limits': 0, 'db_errors': 0}
for stream in log_data.get('data', {}).get('result', []):
for _, message in stream.get('values', []):
msg = message.lower()
if 'timeout' in msg:
patterns['timeouts'] += 1
elif 'rate limit' in msg:
patterns['rate_limits'] += 1
elif 'database' in msg or 'sql' in msg:
patterns['db_errors'] += 1
return patterns
# Example usage
error_breakdown = investigate_service_errors('payment-service')
if error_breakdown['timeouts'] > 10:
print("High timeout rate detected - check downstream services")
Use this as part of your incident response pipeline to spot patterns early—timeouts, rate limits, or database issues—without switching contexts.
2. Detect Authentication and Access Anomalies Continuously
Build lightweight security checks by querying logs for patterns like failed logins or unauthorized access, on a rolling schedule.
def monitor_authentication_anomalies():
failed_logins = query_logs('{service="auth-service"} |= "login failed"', minutes_back=5)
auth_errors = query_logs('{service="api-gateway"} |= "401" OR "403"', minutes_back=5)
alerts = []
if failed_logins and len(failed_logins) > 20:
alerts.append({
'type': 'failed_logins',
'count': len(failed_logins),
'severity': 'high'
})
if auth_errors and len(auth_errors) > 50:
alerts.append({
'type': 'auth_errors',
'count': len(auth_errors),
'severity': 'medium'
})
return alerts
def query_logs(query, minutes_back=5):
end_time = int(datetime.now().timestamp())
start_time = int((datetime.now() - timedelta(minutes=minutes_back)).timestamp())
response = requests.get(
'https://otlp.last9.io/loki/logs/api/v2/query_range',
params={
'query': query,
'start': start_time,
'end': end_time,
'limit': 100
},
headers={'Authorization': f'Basic {encoded_creds}'}
)
if response.status_code == 200:
return extract_messages(response.json())
return []
def extract_messages(log_data):
messages = []
for stream in log_data.get('data', {}).get('result', []):
for _, message in stream.get('values', []):
messages.append(message)
return messages
You can plug this into a cron job or serverless function to catch threats like brute-force attempts or unusual access patterns without a full-blown SIEM.
3. Analyze Performance Issues from Logs Automatically
Use logs to surface bottlenecks like slow queries, long response times, or memory warnings, especially those that don’t trigger metrics-based alerts.
def analyze_performance_patterns(service_name):
queries = [
f'{{service="{service_name}"}} |= "slow query"',
f'{{service="{service_name}"}} |~ "response_time.*[5-9][0-9][0-9][0-9]"', # >5000ms
f'{{service="{service_name}"}} |= "memory" |= "warning"'
]
analysis = {}
for query in queries:
results = query_logs(query, minutes_back=60)
if results:
analysis[query] = {
'count': len(results),
'samples': results[:3]
}
return analysis
# Example usage
performance_issues = analyze_performance_patterns('user-service')
for query, data in performance_issues.items():
print(f"Found {data['count']} issues for pattern: {query}")
This is useful for catching early warning signs, especially for backend services where degraded performance may go unnoticed until it snowballs.
Advanced Query Patterns with Logs API
The API supports advanced LogQL constructs, making it flexible enough to handle multi-dimensional filtering, regex-based searches, and dynamic service discovery. Here are a few examples:
1. Filter Logs by Multiple Labels
Use structured queries to scope logs down to specific services, environments, or versions.
Example: production-only errors from the payment-service
.
curl -X GET \
'https://otlp.last9.io/loki/logs/api/v2/query_range?query=%7Bservice%3D%22payment-service%22%2C%20env%3D%22production%22%7D' \
-H 'Authorization: Basic $(echo -n $username:$password | base64)'
This targets logs that match both service="payment-service"
and env="production"
ideal for narrowing down incidents in large deployments.
2. Use Regex Matching in LogQL
Search logs using regex when you need flexible pattern matching.
Example: fetch timeout-related errors from the API Gateway.
curl -X GET \
'https://otlp.last9.io/loki/logs/api/v2/query_range?query=%7Bservice%3D%22api-gateway%22%7D%20|~%20%22error.*timeout%22' \
-H 'Authorization: Basic $(echo -n $username:$password | base64)'
This matches log lines containing any error
message that includes the word, timeout
even if the phrasing or structure varies.
3. Discover Available Services Dynamically
Query labels directly from the API to discover what services are emitting logs.
Example: list all unique service
label values in a given time window.
curl -X GET \
'https://otlp.last9.io/loki/logs/api/v1/label/service/values?start=1743000000&end=1743600000' \
-H 'Authorization: Basic $(echo -n $username:$password | base64)'
This can be used to build dashboards dynamically or validate service-to-label mappings programmatically.
Plug the Logs API Into Your Workflow
The Last9 Logs API isn’t just for ad-hoc queries. You can integrate it directly into CI pipelines, monitoring setups, and existing observability tools with minimal overhead.
1. Integrate Log Checks in CI/CD Pipelines
Use logs as a signal before or after deploys. For example, you can block deployments if recent logs show spikes in errors.
GitHub Actions Example:
- name: Check deployment logs
run: |
python scripts/check_deployment_logs.py \
--service ${{ github.event.repository.name }} \
--since "5 minutes ago"
This script can call the Query Logs API and exit with a non-zero status if critical error patterns are detected during or after deployment.
2. Trigger Alerts Based on Log Volume
Send alerts based on recent log activity—especially useful for catching issues not surfaced by metrics.
Slack Alert on Error Spike:
def send_slack_alert_if_errors_spike(service_name, threshold=10):
errors = investigate_service_errors(service_name, minutes_back=10)
total_errors = sum(errors.values())
if total_errors > threshold:
requests.post(SLACK_WEBHOOK, json={
'text': f'🚨 {service_name} error spike: {total_errors} errors in last 10 minutes',
'attachments': [{
'color': 'danger',
'fields': [
{'title': 'Timeouts', 'value': errors['timeouts'], 'short': True},
{'title': 'Rate Limits', 'value': errors['rate_limits'], 'short': True},
{'title': 'DB Errors', 'value': errors['db_errors'], 'short': True}
]
}]
})
This is especially helpful for catching intermittent issues that don’t trip Prometheus thresholds but show up clearly in logs.
3. Connect Existing Grafana Dashboards
Since Last9’s API is Loki-compatible, you can plug it into existing Grafana dashboards by simply updating the data source URL. No query changes, no new plugins. Just swap the endpoint.
Error Handling and Debugging
The Logs API is designed to provide informative error messages, but when things fail silently or unexpectedly, these are the most common issues and how to troubleshoot them effectively.
No Data Returned
The API request succeeds (HTTP 200
), but the data.result
field is empty.
Possible causes:
- Time range mismatch
Thestart
andend
parameters must correspond to periods where logs exist.- Check whether you're querying for a future time window
- If you're using
datetime.now().timestamp()
in Python, make sure the values are in seconds, not milliseconds or nanoseconds
- Retention policy limitations
If your logs are older than the configured retention period, the query will return no results. Check your account’s retention settings.
Incorrect service
label
Labels are case-sensitive. payment-service
is different from Payment-Service
. Use the label discovery endpoint to verify exact label values:
curl -X GET \
'https://otlp.last9.io/loki/logs/api/v1/label/service/values' \
-H 'Authorization: Basic <base64-creds>'
LogQL Syntax Errors
The API returns a 400 Bad Request
, usually with a message like parse error
or invalid query
.
How to fix:
- Avoid special characters unless they’re properly encoded. For example:
"
→%22
{
→%7B
|=
→%7C%3D
Ensure your query matches the LogQL syntax. For example:
{service="auth-service"} |= "error"
Tip: If you’re building queries dynamically in code, always use a URL encoding library instead of manually constructing the string.
Invalid or Malformed Time Ranges
The API may return a 400
or simply an empty result set if time values are invalid.
What to check:
- Timestamps must be in seconds since epoch unless using nanoseconds for the
values
field (as in the response). - The
start
time must be less than theend
time.
Some clients (especially JavaScript) return milliseconds by default. Use integer division or appropriate conversion:
int(time.time()) # returns seconds
Authentication Failures
Most authentication issues result in a 401 Unauthorized
or 403 Forbidden
.
Checklist:
- Confirm that the credentials you’re using are still valid and not expired or rotated.
- If using CI/CD, verify that secrets are loaded properly into environment variables before constructing the header.
Ensure your credentials (username:password
) are Base64 encoded correctly. Use:
echo -n 'user:pass' | base64
Don’t include line breaks or extra whitespace in the encoded string.
Header format:
-H 'Authorization: Basic <base64-encoded-user:pass>'
Debugging Tips
- Use a known-good query from earlier examples to rule out query logic.
- Validate time inputs using tools like
date -d @<timestamp>
or online epoch converters. - Capture and inspect the full response, including headers, when scripting with Python or curl. You might be ignoring useful metadata.
Why This Approach Works
- Loki-compatible
You can reuse existing LogQL queries, Grafana dashboards, and tooling without changes. No need to learn a new query language or vendor-specific syntax. - Standard HTTP/JSON
The API uses standard HTTP methods and JSON responses, making it easy to integrate across languages, scripts, CI tools, and monitoring systems. - Structured responses
Logs come back with consistent schemas, timestamps, labels, and messages, so you can feed them directly into automation pipelines or downstream processors without additional parsing. - Production-ready behavior
The API is designed for operational use:- Authenticated requests via standard headers
- Proper handling of rate limits
- Informative error messages and support for large queries
This means you can plug it into real workflows with confidence, whether that’s automated log checks, alerting systems, or dashboard backends.