Trace Go Apps Using Runtime Tracing and OpenTelemetry

When your Go service hits 500ms latencies but CPU usage is flat, tracing gives you visibility into what the profiler misses.

Profiling shows where time is spent.
Tracing shows where time is lost—blocked goroutines, lock contention, scheduler stalls.

With 1–2% runtime overhead, Go’s built-in tracing tools help you:

Inspect goroutines that are stuck or never finish
Measure time spent waiting on locks or network IO
Understand scheduler behavior across tasks

This makes it easier to debug performance regressions that don’t leave a clear footprint.

Quick Start: Capture a Go Trace in 3 Lines

Add this to your main() function:

import "runtime/trace"

func main() {
    f, _ := os.Create("trace.out")
    trace.Start(f)
    defer trace.Stop()

    yourExistingService()
}

Then run your service for ~30 seconds under typical load and execute:

go tool trace trace.out

What You’ll See:

Blocked Goroutines — Identify where and why they’re stuck.
Network I/O Waits — Surface slow reads/writes invisible to CPU profiles.
Scheduler Delays — Spot latency from thread scheduling stalls.
Leaked Goroutines — Find goroutines that never exit and keep growing.

Use this trace data to pinpoint inefficiencies that profiling alone won’t catch.

💡

If your Go app uses an ORM, tracing slow queries and transaction bottlenecks becomes easier, here’s a practical guide on getting started.

Why Go’s Runtime Tracing Outperforms Traditional Profiling

CPU profiling tools like pprof tell you what consumes CPU cycles.
But most performance regressions in production aren’t due to CPU usage—they’re caused by goroutines waiting: on locks, channels, network I/O, or resource starvation.

Runtime tracing captures these wait states and exposes what your program is doing when it’s not running.

Comparative View

Scenario	CPU Profile Output	Runtime Trace Output
API latency ~500ms	Normal CPU usage	Goroutine blocked on DB connection pool
Unbounded memory growth	High allocation rate	Leaked goroutines stuck in retry loops
Sporadic response slowness	Minor GC pauses observed	Timeout on upstream call causing downstream blocking

What `runtime/trace` Records

The runtime/trace package captures a precise event stream across the Go runtime. This enables end-to-end visibility into goroutine execution and coordination.

Captured Events Include:

Goroutine activity
- Creation, start, stop, block, unblock
Scheduling events
- Preemption, sleep, wake-up latency
Synchronization
- Channel sends/receives, mutex contention, select behavior
System interactions
- Syscalls, network I/O blocking, GC phase transitions
User-defined regions
- Instrumented code segments using trace.WithRegion() or trace.NewTask()

This data is time-ordered and correlated, enabling post-mortem analysis of long-tail latencies, deadlocks, and concurrency bugs, where traditional profiling tools remain blind.

Example:

The following Go program spawns multiple workers to handle fake orders, but exits before all goroutines complete. This causes leaked goroutines and memory buildup over time.

package main

import (
    "os"
    "runtime/trace"
    "time"
)

func main() {
    f, err := os.Create("trace.out")
    if err != nil {
        panic(err)
    }
    defer f.Close()

    trace.Start(f)
    defer trace.Stop()

    processOrders()
}

func processOrders() {
    for i := 0; i < 10; i++ {
        go func(workerID int) {
            for j := 0; j < 50; j++ {
                handleOrder(workerID, j)
            }
        }(i)
    }

    // Main exits early; workers still running
    time.Sleep(2 * time.Second)
}

func handleOrder(workerID, orderID int) {
    delay := time.Duration(orderID%10) * 10 * time.Millisecond
    time.Sleep(delay)
}

What the Trace Reveals

Running go tool trace trace.out surfaces:

Goroutines still active at exit
Blocked goroutines waiting on timers or scheduler
Memory not reclaimed due to active worker stacks

These signals don’t show up in pprof CPU profiles.

Fix: Use `sync.WaitGroup`

Proper synchronization ensures all worker goroutines complete before exit:

var wg sync.WaitGroup
wg.Add(10)

for i := 0; i < 10; i++ {
    go func(workerID int) {
        defer wg.Done()
        for j := 0; j < 50; j++ {
            handleOrder(workerID, j)
        }
    }(i)
}

wg.Wait()

This eliminates the leak, stabilizes memory usage, and gives clearer visibility into idle or blocked time via the trace.

💡

To get more visibility into your Go services, start by setting up structured logs, this logging guide breaks it down for developers.

Add Custom Instrumentation for Application Logic

Go's runtime/trace is great for visualizing goroutine scheduling, I/O wait, and other system-level events. But to capture what your application is doing, validating orders, calling APIs, and updating databases, you need custom instrumentation.

Use `trace.NewTask` to Track End-to-End Operations

Wrapping major operations in trace.NewTask creates structured entries in the trace viewer, allowing you to follow domain-specific execution across functions.

func processOrder(ctx context.Context, order *Order) error {
    ctx, task := trace.NewTask(ctx, "process-order")
    defer task.End()

    if err := validateOrder(ctx, order); err != nil {
        return err
    }

    return fulfillOrder(ctx, order)
}

func validateOrder(ctx context.Context, order *Order) error {
    ctx, task := trace.NewTask(ctx, "validate-order")
    defer task.End()

    if err := checkInventory(order.Items); err != nil {
        return err
    }

    return validatePayment(order.Payment)
}

Key detail: Always propagate ctx. If you skip it, tasks will show up disconnected in the trace timeline, making analysis harder.

Highlight Critical Code Paths with `trace.StartRegion`

Use trace.StartRegion to measure latency within smaller sections, like outbound API calls or DB updates. These regions show up as labeled spans nested under the active task.

func processPayment(ctx context.Context, payment *Payment) error {
    ctx, task := trace.NewTask(ctx, "process-payment")
    defer task.End()

    region := trace.StartRegion(ctx, "stripe-api-call")
    result, err := stripe.ProcessPayment(payment)
    region.End()

    if err != nil {
        trace.Log(ctx, "stripe-error", err.Error())
        return err
    }

    region = trace.StartRegion(ctx, "db-update")
    err = db.UpdatePaymentStatus(payment.ID, result.Status)
    region.End()

    return err
}

Attach Structured Data with `trace.Log`

Add context to your trace with structured logs. These show inline within the trace and are helpful for things like:

Logging retry counts
Capturing error messages
Recording feature flags or environment data

trace.Log(ctx, "retry-count", fmt.Sprint(retryCount))
trace.Log(ctx, "customer-tier", user.Tier)

This instrumentation turns runtime traces from system snapshots into actionable application observability tools.

💡

If you're using Logrus with your Go app, this guide explains how to configure and get the most out of it.

Analyze Runtime Behavior Using `go tool trace`

The go tool trace web interface provides several specialized views. These three are most relevant for identifying latency, concurrency, and synchronization issues:

1. Goroutine Timeline

Tracks creation, scheduling, execution, and blocking of goroutines.

What to look for:

Goroutines in a blocked state for extended periods
Long gaps between scheduled execution and actual execution
Sudden spikes in goroutine creation, often indicating retry loops or leaks

Use this view to correlate latency with scheduler delays or excessive concurrency.

2. User-Defined Tasks View

Displays custom instrumentation added via trace.NewTask. Visualizes the logical execution flow across components and goroutines.

Useful for:

Locating slow or stalled operations
Tracing flow breaks due to missing context propagation
Measuring high-level business logic latency

Ensures visibility into application-specific workflows, not just system-level events.

3. Blocking Profile View

Captures blocking events due to:

Network I/O (e.g., HTTP, gRPC, DB connections)
Synchronization primitives (e.g., channels, mutexes, RWLocks)

Common issues:

Contention on shared resources
High wait times on external systems
Coordination delays between goroutines

Note: These blockers don’t show up in CPU profiles since blocked goroutines don’t consume CPU time.

Instrument Go Apps with Flight Recorder Tracing

In production, you don’t want to trace every request. Go’s x/exp/trace offers a flight recorder: a ring buffer that captures trace data in memory and only flushes it when needed, say, on a slow request or an error.

Here’s how to set it up:

import "golang.org/x/exp/trace"

var fr = trace.NewFlightRecorder()

func init() {
    fr.Start()
}

Use it in request handlers like this:

func handleRequest(w http.ResponseWriter, r *http.Request) {
    ctx, task := fr.NewTask(r.Context(), "http.request")
    defer task.End()

    start := time.Now()
    process(w, r)
    duration := time.Since(start)

    // Save trace if it took too long
    if duration > 300*time.Millisecond {
        go saveTrace(fr, duration)
    }
}

What this gives you:

No runtime overhead for 99% of requests
Full trace data for just the slow or failing ones
No need for external trace samplers or exporters until needed

Apply Intelligent Sampling in Your Handlers

In high-throughput services, you’ll want to trace:

Only a percentage of the total traffic
All requests from VIP customers
On-demand debugging traffic (via headers)

Here’s a practical sampling function:

func shouldTrace(r *http.Request) bool {
    if r.Header.Get("X-Debug-Trace") == "true" {
        return true
    }

    if rand.Float64() < 0.01 {
        return true
    }

    return isInternalUser(r.Header.Get("X-User-ID"))
}

Use this to wrap your tracing logic conditionally:

if shouldTrace(r) {
    ctx, task := fr.NewTask(r.Context(), "sampled.request")
    defer task.End()
    processRequest(ctx, r)
} else {
    processRequest(r.Context(), r)
}

Combine Runtime Traces with OpenTelemetry Spans

OpenTelemetry gives you distributed trace spans across services. Go’s runtime trace adds low-level detail like goroutine states and scheduler delays. Used together, you can:

Debug API latency down to blocked goroutines
See exactly where time is spent inside a handler
Capture both business-level and runtime-level events

ctx, span := tracer.Start(ctx, "checkout")
defer span.End()

ctx, task := fr.NewTask(ctx, "checkout.task")
defer task.End()

err := doCheckout(ctx)
if err != nil {
    span.RecordError(err)
    span.SetStatus(codes.Error, err.Error())
}

This dual-tracing pattern helps bridge the gap between:

What your system did (OpenTelemetry spans)
How the runtime executed it (Go trace tasks)

💡

When combining Go runtime tracing with OpenTelemetry, following semantic conventions helps keep traces consistent and easier to query.

Export These Traces

You can plug in a simple trace sink or use an OpenTelemetry Collector to forward spans and Go runtime trace dumps to a backend like Last9, Jaeger, or your storage of choice.

Common Debugging Patterns Using `runtime/trace`

Go's runtime traces help uncover subtle performance issues that standard profiling tools often miss. Below are common patterns worth watching for, along with code to make them traceable.

Spotting Goroutine Leaks

Use `trace.NewTask` to track lifecycle and shutdown

Background workers often survive longer than expected, especially if they’re missing shutdown logic. Here's how to wrap one with trace events:

func startBackgroundWorker(ctx context.Context) {
    ctx, task := trace.NewTask(ctx, "background-worker")
    defer task.End()

    trace.Log(ctx, "status", "starting")

    for {
        select {
        case <-ctx.Done():
            trace.Log(ctx, "status", "shutting-down")
            return
        case work := <-workChan:
            processWork(ctx, work)
        }
    }
}

Look for tasks that start (background-worker) but never call End(). This usually means the goroutine leaked or shutdown was never triggered.

Measuring Lock Contention

Track both wait and work regions inside mutex blocks

Instead of guessing where your code is slowing down, explicitly mark time spent acquiring and holding locks:

func updateCounter(ctx context.Context, delta int) {
    ctx, task := trace.NewTask(ctx, "update-counter")
    defer task.End()

    wait := trace.StartRegion(ctx, "wait-for-mutex")
    mu.Lock()
    wait.End()

    work := trace.StartRegion(ctx, "increment-counter")
    counter += delta
    work.End()

    mu.Unlock()
}

If "wait-for-mutex" spans much longer than "increment-counter", you’re seeing lock contention. This isn't visible in CPU profiles — only traces capture it accurately.

💡

Now, debug Go latency issues directly in your IDE with Last9 MCP. Access runtime traces, logs, and metrics from production to pinpoint bottlenecks fast.

Scale Trace Analysis Across Environments

Once you've validated tracing locally, the next challenge is making this workflow scale across your entire infrastructure. Managing trace files across dozens of services, correlating distributed traces with runtime data, and alerting on performance regressions requires purpose-built tooling.

This is where teams typically graduate from local trace files to observability platforms that can:

Handle high-cardinality trace data without billing surprises
Correlate runtime traces with distributed tracing automatically
Alert on performance regressions detected in trace patterns
Store and query weeks of trace data for trend analysis.

Last9 is built specifically for this challenge:

Native support for both OpenTelemetry and Go runtime traces
High-cardinality data handling without cost explosions
Automated correlation between distributed and runtime traces
Performance regression detection using trace patterns

Get started with Last9 for free today!

FAQs

Q: Will this slow down my production service?
A: With Go 1.21+, runtime tracing overhead is around 1–2% (compared to 10–20% in older versions). Use flight recording or selective sampling to reduce the impact further.

Q: How much trace data should I store?
A: Trace files grow quickly. For production, set up rotation policies and retain data for hours or days. Archive only the traces that surfaced real issues—avoid storing everything.

Q: Can I integrate this with my existing monitoring?
A: Yes. Use OpenTelemetry for distributed traces and Go’s runtime tracing for in-process performance. Correlate both using trace IDs and context propagation.

Q: How do I avoid overwhelming my team with trace data?
A: Start with high-impact paths like external API calls or background jobs. Avoid instrumenting every function—focus on where slowdowns or stalls occur.

Q: What should I look for in traces?
A: Watch for goroutines spending more time blocked than running, unexpected latencies in known operations, or consistent waiting patterns. GC pauses often correlate with spikes—check timestamps.

Trace Go Apps Using Runtime Tracing and OpenTelemetry

Contents

Quick Start: Capture a Go Trace in 3 Lines

Why Go’s Runtime Tracing Outperforms Traditional Profiling

What `runtime/trace` Records

Fix: Use `sync.WaitGroup`

Add Custom Instrumentation for Application Logic

Use `trace.NewTask` to Track End-to-End Operations

Highlight Critical Code Paths with `trace.StartRegion`

Attach Structured Data with `trace.Log`

Analyze Runtime Behavior Using `go tool trace`

1. Goroutine Timeline

2. User-Defined Tasks View

3. Blocking Profile View

Instrument Go Apps with Flight Recorder Tracing

Apply Intelligent Sampling in Your Handlers

Combine Runtime Traces with OpenTelemetry Spans

Export These Traces

Common Debugging Patterns Using `runtime/trace`

Spotting Goroutine Leaks

Use `trace.NewTask` to track lifecycle and shutdown

Measuring Lock Contention

Track both wait and work regions inside mutex blocks

Scale Trace Analysis Across Environments

FAQs

Contents

Do More with Less

Handcrafted Related Posts

Enable Kong Gateway Tracing in 5 Minutes

Improve Consistency Across Signals with OTel Semantic Conventions

How to Write Logs to a File in Go

Trace Go Apps Using Runtime Tracing and OpenTelemetry

Contents

Quick Start: Capture a Go Trace in 3 Lines

Why Go’s Runtime Tracing Outperforms Traditional Profiling

What runtime/trace Records

Fix: Use sync.WaitGroup

Add Custom Instrumentation for Application Logic

Use trace.NewTask to Track End-to-End Operations

Highlight Critical Code Paths with trace.StartRegion

Attach Structured Data with trace.Log

Analyze Runtime Behavior Using go tool trace

1. Goroutine Timeline

2. User-Defined Tasks View

3. Blocking Profile View

Instrument Go Apps with Flight Recorder Tracing

Apply Intelligent Sampling in Your Handlers

Combine Runtime Traces with OpenTelemetry Spans

Export These Traces

Common Debugging Patterns Using runtime/trace

Spotting Goroutine Leaks

Use trace.NewTask to track lifecycle and shutdown

Measuring Lock Contention

Track both wait and work regions inside mutex blocks

Scale Trace Analysis Across Environments

FAQs

Contents

Do More with Less

Handcrafted Related Posts

Enable Kong Gateway Tracing in 5 Minutes

Improve Consistency Across Signals with OTel Semantic Conventions

How to Write Logs to a File in Go

What `runtime/trace` Records

Fix: Use `sync.WaitGroup`

Use `trace.NewTask` to Track End-to-End Operations

Highlight Critical Code Paths with `trace.StartRegion`

Attach Structured Data with `trace.Log`

Analyze Runtime Behavior Using `go tool trace`

Common Debugging Patterns Using `runtime/trace`

Use `trace.NewTask` to track lifecycle and shutdown