Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

Trace Go Apps Using Runtime Tracing and OpenTelemetry

Instrument Go apps with runtime tracing and OpenTelemetry to spot goroutine issues, lock contention, and performance bottlenecks early.

Jul 17th, ‘25
Trace Go Apps Using Runtime Tracing and OpenTelemetry
See How Last9 Works

Unified observability for all your telemetry. Open standards. Simple pricing.

Talk to us

When your Go service hits 500ms latencies but CPU usage is flat, tracing gives you visibility into what the profiler misses.

  • Profiling shows where time is spent.
  • Tracing shows where time is lost—blocked goroutines, lock contention, scheduler stalls.

With 1–2% runtime overhead, Go’s built-in tracing tools help you:

  • Inspect goroutines that are stuck or never finish
  • Measure time spent waiting on locks or network IO
  • Understand scheduler behavior across tasks

This makes it easier to debug performance regressions that don’t leave a clear footprint.

Quick Start: Capture a Go Trace in 3 Lines

Add this to your main() function:

import "runtime/trace"

func main() {
    f, _ := os.Create("trace.out")
    trace.Start(f)
    defer trace.Stop()

    yourExistingService()
}

Then run your service for ~30 seconds under typical load and execute:

go tool trace trace.out

What You’ll See:

  • Blocked Goroutines — Identify where and why they’re stuck.
  • Network I/O Waits — Surface slow reads/writes invisible to CPU profiles.
  • Scheduler Delays — Spot latency from thread scheduling stalls.
  • Leaked Goroutines — Find goroutines that never exit and keep growing.

Use this trace data to pinpoint inefficiencies that profiling alone won’t catch.

💡
If your Go app uses an ORM, tracing slow queries and transaction bottlenecks becomes easier, here’s a practical guide on getting started.

Why Go’s Runtime Tracing Outperforms Traditional Profiling

CPU profiling tools like pprof tell you what consumes CPU cycles.
But most performance regressions in production aren’t due to CPU usage—they’re caused by goroutines waiting: on locks, channels, network I/O, or resource starvation.

Runtime tracing captures these wait states and exposes what your program is doing when it’s not running.

Comparative View

Scenario CPU Profile Output Runtime Trace Output
API latency ~500ms Normal CPU usage Goroutine blocked on DB connection pool
Unbounded memory growth High allocation rate Leaked goroutines stuck in retry loops
Sporadic response slowness Minor GC pauses observed Timeout on upstream call causing downstream blocking

What runtime/trace Records

The runtime/trace package captures a precise event stream across the Go runtime. This enables end-to-end visibility into goroutine execution and coordination.

Captured Events Include:

  • Goroutine activity
    • Creation, start, stop, block, unblock
  • Scheduling events
    • Preemption, sleep, wake-up latency
  • Synchronization
    • Channel sends/receives, mutex contention, select behavior
  • System interactions
    • Syscalls, network I/O blocking, GC phase transitions
  • User-defined regions
    • Instrumented code segments using trace.WithRegion() or trace.NewTask()

This data is time-ordered and correlated, enabling post-mortem analysis of long-tail latencies, deadlocks, and concurrency bugs, where traditional profiling tools remain blind.

Example:

The following Go program spawns multiple workers to handle fake orders, but exits before all goroutines complete. This causes leaked goroutines and memory buildup over time.

package main

import (
    "os"
    "runtime/trace"
    "time"
)

func main() {
    f, err := os.Create("trace.out")
    if err != nil {
        panic(err)
    }
    defer f.Close()

    trace.Start(f)
    defer trace.Stop()

    processOrders()
}

func processOrders() {
    for i := 0; i < 10; i++ {
        go func(workerID int) {
            for j := 0; j < 50; j++ {
                handleOrder(workerID, j)
            }
        }(i)
    }

    // Main exits early; workers still running
    time.Sleep(2 * time.Second)
}

func handleOrder(workerID, orderID int) {
    delay := time.Duration(orderID%10) * 10 * time.Millisecond
    time.Sleep(delay)
}

What the Trace Reveals

Running go tool trace trace.out surfaces:

  • Goroutines still active at exit
  • Blocked goroutines waiting on timers or scheduler
  • Memory not reclaimed due to active worker stacks

These signals don’t show up in pprof CPU profiles.

Fix: Use sync.WaitGroup

Proper synchronization ensures all worker goroutines complete before exit:

var wg sync.WaitGroup
wg.Add(10)

for i := 0; i < 10; i++ {
    go func(workerID int) {
        defer wg.Done()
        for j := 0; j < 50; j++ {
            handleOrder(workerID, j)
        }
    }(i)
}

wg.Wait()

This eliminates the leak, stabilizes memory usage, and gives clearer visibility into idle or blocked time via the trace.

💡
To get more visibility into your Go services, start by setting up structured logs, this logging guide breaks it down for developers.

Add Custom Instrumentation for Application Logic

Go's runtime/trace is great for visualizing goroutine scheduling, I/O wait, and other system-level events. But to capture what your application is doing, validating orders, calling APIs, and updating databases, you need custom instrumentation.

Use trace.NewTask to Track End-to-End Operations

Wrapping major operations in trace.NewTask creates structured entries in the trace viewer, allowing you to follow domain-specific execution across functions.

func processOrder(ctx context.Context, order *Order) error {
    ctx, task := trace.NewTask(ctx, "process-order")
    defer task.End()

    if err := validateOrder(ctx, order); err != nil {
        return err
    }

    return fulfillOrder(ctx, order)
}

func validateOrder(ctx context.Context, order *Order) error {
    ctx, task := trace.NewTask(ctx, "validate-order")
    defer task.End()

    if err := checkInventory(order.Items); err != nil {
        return err
    }

    return validatePayment(order.Payment)
}

Key detail: Always propagate ctx. If you skip it, tasks will show up disconnected in the trace timeline, making analysis harder.

Highlight Critical Code Paths with trace.StartRegion

Use trace.StartRegion to measure latency within smaller sections, like outbound API calls or DB updates. These regions show up as labeled spans nested under the active task.

func processPayment(ctx context.Context, payment *Payment) error {
    ctx, task := trace.NewTask(ctx, "process-payment")
    defer task.End()

    region := trace.StartRegion(ctx, "stripe-api-call")
    result, err := stripe.ProcessPayment(payment)
    region.End()

    if err != nil {
        trace.Log(ctx, "stripe-error", err.Error())
        return err
    }

    region = trace.StartRegion(ctx, "db-update")
    err = db.UpdatePaymentStatus(payment.ID, result.Status)
    region.End()

    return err
}

Attach Structured Data with trace.Log

Add context to your trace with structured logs. These show inline within the trace and are helpful for things like:

  • Logging retry counts
  • Capturing error messages
  • Recording feature flags or environment data
trace.Log(ctx, "retry-count", fmt.Sprint(retryCount))
trace.Log(ctx, "customer-tier", user.Tier)

This instrumentation turns runtime traces from system snapshots into actionable application observability tools.

💡
If you're using Logrus with your Go app, this guide explains how to configure and get the most out of it.

Analyze Runtime Behavior Using go tool trace

The go tool trace web interface provides several specialized views. These three are most relevant for identifying latency, concurrency, and synchronization issues:

1. Goroutine Timeline

Tracks creation, scheduling, execution, and blocking of goroutines.

What to look for:

  • Goroutines in a blocked state for extended periods
  • Long gaps between scheduled execution and actual execution
  • Sudden spikes in goroutine creation, often indicating retry loops or leaks

Use this view to correlate latency with scheduler delays or excessive concurrency.

2. User-Defined Tasks View

Displays custom instrumentation added via trace.NewTask. Visualizes the logical execution flow across components and goroutines.

Useful for:

  • Locating slow or stalled operations
  • Tracing flow breaks due to missing context propagation
  • Measuring high-level business logic latency

Ensures visibility into application-specific workflows, not just system-level events.

3. Blocking Profile View

Captures blocking events due to:

  • Network I/O (e.g., HTTP, gRPC, DB connections)
  • Synchronization primitives (e.g., channels, mutexes, RWLocks)

Common issues:

  • Contention on shared resources
  • High wait times on external systems
  • Coordination delays between goroutines

Note: These blockers don’t show up in CPU profiles since blocked goroutines don’t consume CPU time.

Instrument Go Apps with Flight Recorder Tracing

In production, you don’t want to trace every request. Go’s x/exp/trace offers a flight recorder: a ring buffer that captures trace data in memory and only flushes it when needed, say, on a slow request or an error.

Here’s how to set it up:

import "golang.org/x/exp/trace"

var fr = trace.NewFlightRecorder()

func init() {
    fr.Start()
}

Use it in request handlers like this:

func handleRequest(w http.ResponseWriter, r *http.Request) {
    ctx, task := fr.NewTask(r.Context(), "http.request")
    defer task.End()

    start := time.Now()
    process(w, r)
    duration := time.Since(start)

    // Save trace if it took too long
    if duration > 300*time.Millisecond {
        go saveTrace(fr, duration)
    }
}

What this gives you:

  • No runtime overhead for 99% of requests
  • Full trace data for just the slow or failing ones
  • No need for external trace samplers or exporters until needed

Apply Intelligent Sampling in Your Handlers

In high-throughput services, you’ll want to trace:

  • Only a percentage of the total traffic
  • All requests from VIP customers
  • On-demand debugging traffic (via headers)

Here’s a practical sampling function:

func shouldTrace(r *http.Request) bool {
    if r.Header.Get("X-Debug-Trace") == "true" {
        return true
    }

    if rand.Float64() < 0.01 {
        return true
    }

    return isInternalUser(r.Header.Get("X-User-ID"))
}

Use this to wrap your tracing logic conditionally:

if shouldTrace(r) {
    ctx, task := fr.NewTask(r.Context(), "sampled.request")
    defer task.End()
    processRequest(ctx, r)
} else {
    processRequest(r.Context(), r)
}

Combine Runtime Traces with OpenTelemetry Spans

OpenTelemetry gives you distributed trace spans across services. Go’s runtime trace adds low-level detail like goroutine states and scheduler delays. Used together, you can:

  • Debug API latency down to blocked goroutines
  • See exactly where time is spent inside a handler
  • Capture both business-level and runtime-level events
ctx, span := tracer.Start(ctx, "checkout")
defer span.End()

ctx, task := fr.NewTask(ctx, "checkout.task")
defer task.End()

err := doCheckout(ctx)
if err != nil {
    span.RecordError(err)
    span.SetStatus(codes.Error, err.Error())
}

This dual-tracing pattern helps bridge the gap between:

  • What your system did (OpenTelemetry spans)
  • How the runtime executed it (Go trace tasks)
💡
When combining Go runtime tracing with OpenTelemetry, following semantic conventions helps keep traces consistent and easier to query.

Export These Traces

You can plug in a simple trace sink or use an OpenTelemetry Collector to forward spans and Go runtime trace dumps to a backend like Last9, Jaeger, or your storage of choice.

Common Debugging Patterns Using runtime/trace

Go's runtime traces help uncover subtle performance issues that standard profiling tools often miss. Below are common patterns worth watching for, along with code to make them traceable.

Spotting Goroutine Leaks

Use trace.NewTask to track lifecycle and shutdown

Background workers often survive longer than expected, especially if they’re missing shutdown logic. Here's how to wrap one with trace events:

func startBackgroundWorker(ctx context.Context) {
    ctx, task := trace.NewTask(ctx, "background-worker")
    defer task.End()

    trace.Log(ctx, "status", "starting")

    for {
        select {
        case <-ctx.Done():
            trace.Log(ctx, "status", "shutting-down")
            return
        case work := <-workChan:
            processWork(ctx, work)
        }
    }
}

Look for tasks that start (background-worker) but never call End(). This usually means the goroutine leaked or shutdown was never triggered.

Measuring Lock Contention

Track both wait and work regions inside mutex blocks

Instead of guessing where your code is slowing down, explicitly mark time spent acquiring and holding locks:

func updateCounter(ctx context.Context, delta int) {
    ctx, task := trace.NewTask(ctx, "update-counter")
    defer task.End()

    wait := trace.StartRegion(ctx, "wait-for-mutex")
    mu.Lock()
    wait.End()

    work := trace.StartRegion(ctx, "increment-counter")
    counter += delta
    work.End()

    mu.Unlock()
}

If "wait-for-mutex" spans much longer than "increment-counter", you’re seeing lock contention. This isn't visible in CPU profiles — only traces capture it accurately.

💡
Now, debug Go latency issues directly in your IDE with Last9 MCP. Access runtime traces, logs, and metrics from production to pinpoint bottlenecks fast.

Scale Trace Analysis Across Environments

Once you've validated tracing locally, the next challenge is making this workflow scale across your entire infrastructure. Managing trace files across dozens of services, correlating distributed traces with runtime data, and alerting on performance regressions requires purpose-built tooling.

This is where teams typically graduate from local trace files to observability platforms that can:

  • Handle high-cardinality trace data without billing surprises
  • Correlate runtime traces with distributed tracing automatically
  • Alert on performance regressions detected in trace patterns
  • Store and query weeks of trace data for trend analysis.

Last9 is built specifically for this challenge:

  • Native support for both OpenTelemetry and Go runtime traces
  • High-cardinality data handling without cost explosions
  • Automated correlation between distributed and runtime traces
  • Performance regression detection using trace patterns

Get started with Last9 for free today!

FAQs

Q: Will this slow down my production service?
A: With Go 1.21+, runtime tracing overhead is around 1–2% (compared to 10–20% in older versions). Use flight recording or selective sampling to reduce the impact further.

Q: How much trace data should I store?
A: Trace files grow quickly. For production, set up rotation policies and retain data for hours or days. Archive only the traces that surfaced real issues—avoid storing everything.

Q: Can I integrate this with my existing monitoring?
A: Yes. Use OpenTelemetry for distributed traces and Go’s runtime tracing for in-process performance. Correlate both using trace IDs and context propagation.

Q: How do I avoid overwhelming my team with trace data?
A: Start with high-impact paths like external API calls or background jobs. Avoid instrumenting every function—focus on where slowdowns or stalls occur.

Q: What should I look for in traces?
A: Watch for goroutines spending more time blocked than running, unexpected latencies in known operations, or consistent waiting patterns. GC pauses often correlate with spikes—check timestamps.

Authors
Preeti Dewani

Preeti Dewani

Technical Product Manager at Last9

X

Contents

Do More with Less

Unlock high cardinality monitoring for your teams.