Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

Apr 30th, ‘25 / 9 min read

The Ultimate Guide to GCP Logs for DevOps Engineers

Discover everything DevOps engineers need to know about GCP logs, from collection to analysis, to optimize performance and troubleshooting.

The Ultimate Guide to GCP Logs for DevOps Engineers

Navigating Google Cloud Platform logs can be a challenge, especially when you're trying to pinpoint that one error affecting production. For DevOps engineers, mastering GCP logs is crucial for effective troubleshooting and system performance.

This guide covers everything you need to know, from setting up your logs to using them for better insights and smoother workflows.

Understanding GCP Logs: Types and Structure

GCP logs are records of events that happen within your Google Cloud Platform environment. Think of them as your system's journal—they track who did what, when, and how it went down.

These logs capture everything from API requests to resource changes, security events, and application behavior. When something goes wrong (and let's be honest, something always does), these logs become your first line of defense.

Google Cloud organizes logs into several categories:

  • Platform logs: System-generated logs from GCP services
  • User-written logs: Custom logs from your applications
  • Component logs: Service-specific logs (like App Engine, Compute Engine)
  • Audit logs: Records of who accessed what and when
💡
For more on monitoring and analyzing logs in Java applications, check out our article on Java GC logs.

Why Effective GCP Logs Management Is Critical for DevOps Success

For DevOps teams, logging isn't just a nice-to-have—it's a must-have. Here's why:

  • Troubleshooting: Find and fix issues faster by seeing exactly what happened
  • Security monitoring: Detect unusual activity before it becomes a problem
  • Compliance: Meet regulatory requirements with audit trails
  • Performance optimization: Identify bottlenecks and opportunities to improve
  • Continuous improvement: Learn from patterns and trends over time

Without solid logging, you're flying blind. With it, you've got a GPS for your entire infrastructure.

Setting Up GCP Logs: Step-by-Step Configuration Guide

Setting up effective logging in GCP doesn't have to be complicated. Here's how to get up and running:

How to Enable and Configure the Cloud Logging API

First things first—make sure the Cloud Logging API is enabled for your project:

  1. Go to the Google Cloud Console
  2. Navigate to "APIs & Services" > "Library"
  3. Search for "Cloud Logging API"
  4. Click "Enable"

Installing the Cloud Logging Agent

For VM instances, you'll need the Cloud Logging agent to collect and send logs:

  1. For Windows VMs:
    • Download the installer from Google Cloud
    • Run the MSI package as an administrator
    • The agent automatically starts after installation

For Compute Engine VMs running Linux:

curl -sSO https://dl.google.com/cloudagents/add-logging-agent-repo.sh
sudo bash add-logging-agent-repo.sh
sudo apt-get update
sudo apt-get install google-fluentd
💡
To learn more about sending logs efficiently across your systems, check out our article on log shippers.

Creating Custom Log Routing for Storage and Analysis

By default, GCP sends logs to Cloud Logging, but you can customize where they go:

  1. Go to "Logging" > "Logs Router"
  2. Create a sink to route specific logs to destinations like:
    • Cloud Storage (for long-term archiving)
    • BigQuery (for analysis)
    • Pub/Sub (for real-time processing)
    • Other GCP projects

Example sink filter for routing only error logs:

severity>=ERROR

Building Actionable Log-Based Metrics for Monitoring

Turn your logs into actionable metrics that help you monitor system health:

  1. Go to "Logging" > "Logs-based Metrics"
  2. Click "CREATE METRIC"
  3. For a counter metric (counting occurrences):
    • Select "Counter" as the metric type
    • Name your metric (e.g., "auth_failures")
    • Write a filter like: resource.type="gce_instance" AND textPayload:"Authentication failed"
    • Click "Create Metric"
  4. For a distribution metric (tracking numerical values):
    • Select "Distribution" as the metric type
    • Name your metric (e.g., "request_latency")
    • Write a filter to find relevant logs
    • Specify which field contains your numerical value (e.g., jsonPayload.latency_ms)
    • Click "Create Metric"

These metrics can then power dashboards and alerts in Cloud Monitoring, giving you visibility into what matters. You can create charts, set thresholds, and get notified when issues arise.

Using the Log Explorer Effectively

The Log Explorer is your command center for finding and analyzing logs:

  1. Access it through "Logging" > "Logs Explorer"
  2. Save frequent searches as "Saved Queries"
  3. Use the histogram to identify spikes in log volume

Use the query builder or write custom queries:

resource.type="gce_instance"AND severity>=ERRORAND timestamp>"2025-04-29T00:00:00Z"

The Log Explorer supports both simple filtering and complex queries using Google's query language.

💡
For more insights on monitoring your GCP environment, check out our article on GCP monitoring.

Essential Best Practices for Maximizing GCP Logs Value

Now that you're set up, let's talk about doing it right. Follow these best practices to get the most out of your GCP logs:

How to Structure JSON Logs for Better Searchability

Structured logs make searching and analysis much easier:

{
  "severity": "ERROR",
  "user": "user-123",
  "action": "login",
  "status": "failed",
  "reason": "invalid_credentials",
  "timestamp": "2025-04-30T12:34:56Z"
}

This format lets you filter precisely and build better metrics.

Using Severity Levels Correctly to Prioritize Issues

Don't cry wolf—use severity levels correctly:

  • DEBUG: Detailed info for debugging
  • INFO: Confirmation that things are working
  • WARNING: Something unexpected but not critical
  • ERROR: Something failed
  • CRITICAL: System is unusable

When everything is an error, nothing is.

Creating Exclusion Filters to Reduce Noise and Costs

Not all logs are created equal. Save money and reduce noise by excluding logs you don't need:

  1. Go to "Logging" > "Logs Router"
  2. Click "CREATE EXCLUSION"
  3. Name your exclusion (e.g., "low-priority-app-logs")
  4. Write a filter for logs you want to exclude, such as:
    • resource.type="gce_instance" AND severity<="INFO"
    • resource.type="k8s_container" AND resource.labels.namespace_name="dev" AND severity<="WARNING"
    • logName:"projects/my-project/logs/requests" AND jsonPayload.status=200
  5. Set the exclusion percentage (often 100%)

This keeps your logging costs under control while focusing on what matters.

Building Custom Log Views for Faster Troubleshooting

Custom log views save time when troubleshooting:

  1. Run a query that shows what you need
  2. Click "Save as view"
  3. Name it something clear like "Production Errors Last 24h"

Next time there's an issue, you're one click away from the relevant logs.

💡
To better understand the difference between logging and monitoring, check out our article on logging vs monitoring.

Advanced GCP Logging Strategies for Production Environments

Ready to take your logging to the next level? Try these advanced techniques:

Using GCP Error Reporting Service

GCP's Error Reporting automatically groups and analyzes errors in your logs:

  1. Go to "Error Reporting" in the GCP console
  2. Review automatically detected errors from your logs
  3. Set up notifications for new error types or spikes
  4. Link errors directly to the logs that generated them

This service works with logs from App Engine, Compute Engine, Cloud Functions, and custom applications that use proper error formatting.

Working with Client Libraries for Application Logging

Google provides client libraries for various programming languages to integrate with Cloud Logging:

// Node.js example
const {Logging} = require('@google-cloud/logging');
const logging = new Logging();
const log = logging.log('my-custom-log');

// Write a log entry
const entry = log.entry({resource: {type: 'global'}}, {
  message: 'User login successful',
  userId: 'user-123',
  action: 'login'
});

log.write(entry);

Similar libraries exist for Python, Java, Go, Ruby, PHP, and C#.

Creating Proactive Log-Based Alerting Systems

Set up alerts to notify you when things go wrong:

  1. Create a log-based metric for the condition you want to monitor
  2. Go to "Monitoring" > "Alerting"
  3. Create an alert based on your metric
  4. Configure notification channels (email, Slack, PagerDuty)

Now you'll know about problems before users do.

Implementing Request Tracing with Correlation IDs

Track requests across services by including a correlation ID in all logs:

{
  "message": "Processing payment",
  "correlation_id": "req-abc-123",
  "service": "payment-processor"
}

This makes it easy to follow a request's journey through your system.

Creating a Comprehensive Logging Policy for Different Data Types

Log Category Retention Period Storage Location Access Control
Application Errors 30 days Cloud Logging DevOps Team
Security Events 1 year Cloud Storage + BigQuery Security Team
Audit Logs 7 years Cloud Storage (archived) Compliance Team
Debug Logs 3 days Cloud Logging Developers

This kind of policy helps balance cost, compliance, and utility.

💡
For a deeper understanding of log levels and how they help organize your logs, check out our article on log levels explained.

Effective Troubleshooting Techniques Using GCP Logs

When things go wrong, logs are your best friend. Here's how to use them effectively:

Recognizing Common Error Patterns in Cloud Environments

Learn to recognize these common patterns:

  • Sudden spikes in error rates: Often indicate a deployment issue
  • Gradual increase in latency: May signal a resource constraint
  • Periodic errors: Could point to cron jobs or scheduled tasks
  • Cascading failures: One service fails, triggering failures in dependent services

Advanced Query Techniques for Pinpointing Issues Fast

Master these query techniques to find what you need:

Context expansion: Once you find an error, look around it

timestamp >= "2025-04-30T12:30:00Z" AND timestamp <= "2025-04-30T12:35:00Z"

Error focusing: Filter to see just the problems

severity >= "ERROR" OR textPayload:"exception"

Service isolation: Focus on one component at a time

resource.type = "k8s_container" AND resource.labels.namespace_name = "production"

Time range narrowing: Start broad, then zoom in

timestamp >= "2025-04-29T00:00:00Z" AND timestamp <= "2025-04-30T00:00:00Z"

Using Real-Time Log Analysis During Incidents

For ongoing issues, use the Logs Explorer to watch logs in real-time:

  1. Set up your query
  2. Toggle the "Stream logs" button
  3. Watch for patterns or specific errors as they happen

This can be invaluable during incidents or deployments.

💡
To learn more about how traces and spans fit into observability, check out our article on traces and spans in observability.

Enhancing GCP Logs with Third-Party Observability Tools

While GCP's built-in tools are good, connecting to specialized observability platforms can be even better. Last9 stands out as a particularly strong option here.

Why Last9 Outperforms Native Solutions for GCP Log Management

Last9 offers a managed observability solution that works beautifully with GCP logs. Their platform connects seamlessly with your GCP environment, providing:

  • High-cardinality observability that scales with your infrastructure
  • Unified views that bring together metrics, logs, and traces
  • Cost-effective storage and processing compared to native solutions
  • Real-time correlation between events across your stack

Industry leaders like Probo, CleverTap, and Replit trust Last9 to monitor their critical systems. Last9 has monitored 11 of the 20 largest live-streaming events in history—proof that it can handle serious scale.

Step-by-Step Guide to Integrating Last9 with GCP Logs

Integration is straightforward:

  1. Export GCP logs to Pub/Sub
  2. Connect Last9 to your Pub/Sub topic
  3. Configure visualization and alerting in Last9's interface

Last9 works with both OpenTelemetry and Prometheus, making it a flexible choice regardless of your existing tooling.

Comparing Top Observability Tools for GCP Environments

While Last9 offers excellent capabilities, there are other tools worth considering for your monitoring stack:

Tool Strengths Best For
Last9 High-cardinality, unified monitoring, cost-effective Teams needing comprehensive observability with budget constraints
Grafana Visualization, open-source Teams already using Prometheus
Elasticsearch Full-text search, analysis Complex log searching and analysis
Jaeger Distributed tracing Microservice architectures
Loki Log aggregation, Grafana integration Kubernetes environments

The right choice depends on your specific needs, but Last9's combination of features, performance, and value makes it worth serious consideration.

💡
For a better understanding of trace-level logging and its benefits, check out our article on trace-level logging.

How to Analyze GCP Logs with BigQuery

BigQuery offers powerful analysis capabilities for your logs. Here's how to leverage it:

Setting Up Log Export to BigQuery

  1. Create a BigQuery dataset to receive your logs
  2. Go to "Logging" > "Logs Router"
  3. Create a sink with BigQuery as the destination
  4. Select the dataset you created
  5. Define your filter (or leave it blank for all logs)

Running Powerful Log Analytics Queries

Once your logs are in BigQuery, you can run SQL queries like this:

-- Find the top 10 most frequent error messages
SELECT
  jsonPayload.message AS error_message,
  COUNT(*) AS error_count
FROM
  `your-project.your_dataset.your_log_table`
WHERE
  severity = "ERROR"
  AND timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY
  error_message
ORDER BY
  error_count DESC
LIMIT 10;
-- Calculate 95th percentile response time by service
SELECT
  jsonPayload.service AS service_name,
  APPROX_QUANTILES(jsonPayload.latency_ms, 100)[OFFSET(95)] AS p95_latency
FROM
  `your-project.your_dataset.your_log_table`
WHERE
  timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
GROUP BY
  service_name
ORDER BY
  p95_latency DESC;

These queries enable deep analysis of your logging data at scale.

Wrapping Up

Mastering GCP logs can greatly improve your troubleshooting and system visibility. Start with the basics—enable logging, set up filters, and create alerts—and gradually incorporate more advanced techniques. Over time, you'll appreciate the efficiency and insight good logging provides.

💡
If you have questions or tips about GCP logs, join our Discord community to connect with other DevOps professionals and share your experiences.

FAQs

How long does GCP store logs by default?

By default, GCP stores audit logs for 400 days and other logs for 30 days. You can customize these retention periods based on your needs.

Can I export GCP logs to my storage?

Yes! You can export logs to Cloud Storage, BigQuery, or Pub/Sub using log sinks. From there, you can move them anywhere you need.

How can I search across multiple GCP projects?

Create an aggregated sink that collects logs from multiple projects into a central project or storage location. This gives you a unified view.

Are GCP logs encrypted?

Yes, GCP logs are encrypted both in transit and at rest. For additional security, you can use customer-managed encryption keys (CMEK).

How can I limit who can see sensitive logs?

Use IAM roles to control access. Create custom roles that grant access only to specific log categories or views, and assign these roles carefully.

What's the difference between Cloud Logging and Operations Suite?

Cloud Logging is specifically for log management, while Operations Suite (formerly Stackdriver) is a broader solution that includes monitoring, logging, and APM features.

Contents


Newsletter

Stay updated on the latest from Last9.

Authors
Preeti Dewani

Preeti Dewani

Technical Product Manager at Last9

X
Topics