Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

Apr 24th, ‘25 / 7 min read

How to Use OpenTelemetry with Your GraphQL Stack

Learn how to add observability to your GraphQL APIs using OpenTelemetry—track requests, monitor performance, and troubleshoot faster.

How to Use OpenTelemetry with Your GraphQL Stack

Monitoring GraphQL applications presents unique challenges due to their complex execution model. The nested structure of queries, parallel resolver execution, and variable response payloads create observability gaps that traditional monitoring approaches struggle to address.

OpenTelemetry offers a robust solution for instrumenting GraphQL services, providing visibility into the full request lifecycle. This guide provides practical steps for implementing, optimizing, and troubleshooting OpenTelemetry in GraphQL environments, helping you build a comprehensive observability strategy for your API layer.

What Makes GraphQL Monitoring Different?

GraphQL isn't your standard REST API, and that makes all the difference when it comes to monitoring. With GraphQL, a single request can trigger dozens of resolvers, touch multiple data sources, and return wildly different response sizes based on the query.

Traditional API monitoring tools often miss the mark because:

  • They track requests at the endpoint level, not the resolver level
  • They can't show how nested fields impact performance
  • They struggle to correlate resolver execution with database queries
  • They lack visibility into the GraphQL parsing and validation phases

This is why pairing GraphQL with OpenTelemetry makes so much sense - you get granular insight into every step of the query execution.

💡
If you're working with custom metrics, this intro to OpenTelemetry custom metrics can help you get the basics right before layering on complexity.

Setting Up OpenTelemetry in Your GraphQL Server

Getting OpenTelemetry running with your GraphQL server isn't as complicated as it might seem. Here's how to set it up step by step.

Installing OpenTelemetry Packages for GraphQL Integration

For a Node.js environment with Apollo Server, you'll need these packages:

npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-trace-otlp-proto @opentelemetry/instrumentation-graphql

For a Java environment with GraphQL Java:

implementation 'io.opentelemetry:opentelemetry-api:1.28.0'
implementation 'io.opentelemetry:opentelemetry-sdk:1.28.0'
implementation 'io.opentelemetry:opentelemetry-exporter-otlp:1.28.0'
implementation 'io.opentelemetry.instrumentation:opentelemetry-graphql-java-12.0:1.23.0-alpha'

Configuring Basic OpenTelemetry SDK for GraphQL Server

Here's a minimal setup for Node.js that will get you started:

// tracing.js
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-proto');
const { GraphQLInstrumentation } = require('@opentelemetry/instrumentation-graphql');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    // URL of your collector
    url: 'http://localhost:4318/v1/traces',
  }),
  instrumentations: [
    getNodeAutoInstrumentations(),
    new GraphQLInstrumentation({
      // Capture resolver timing
      mergeItems: true,
      // Include resolver parameters in spans
      allowValues: true,
    }),
  ],
});

sdk.start();

Make sure to import this file at the very beginning of your application:

// index.js
require('./tracing'); // Must be first import
const { ApolloServer } = require('apollo-server');
// Rest of your server setup
💡
If your setup includes Postgres, here’s how you can use OpenTelemetry with Postgres to get better visibility into query performance and database health.

Capturing Meaningful GraphQL Telemetry

Setting up basic instrumentation is just the start. To get real value from OpenTelemetry and GraphQL, you need to capture the right signals.

Creating Custom Spans for GraphQL Resolver Performance Tracking

Resolvers are the heart of GraphQL performance. Here's how to instrument them properly:

const resolvers = {
  Query: {
    users: async (parent, args, context, info) => {
      // Create a custom span for this resolver
      const span = context.tracer.startSpan('users.resolver');
      
      // Add useful attributes
      span.setAttribute('graphql.args.limit', args.limit);
      span.setAttribute('graphql.args.offset', args.offset);
      
      try {
        // Your existing resolver logic
        const users = await getUsersFromDatabase(args);
        
        // Record result metadata
        span.setAttribute('result.count', users.length);
        return users;
      } catch (error) {
        // Record errors
        span.recordException(error);
        throw error;
      } finally {
        span.end();
      }
    }
  }
};

Monitoring GraphQL Query Parsing and Validation Phases

The parsing and validation phases can be performance bottlenecks, too. Here's how to track them with Apollo Server:

const server = new ApolloServer({
  typeDefs,
  resolvers,
  plugins: [
    {
      async requestDidStart(requestContext) {
        const { request, context } = requestContext;
        const span = context.tracer.startSpan('graphql.request');
        span.setAttribute('graphql.query', request.query);
        context.requestSpan = span;
        
        return {
          async parsingDidStart() {
            const parseSpan = context.tracer.startSpan('graphql.parse');
            return () => {
              parseSpan.end();
            };
          },
          async validationDidStart() {
            const validationSpan = context.tracer.startSpan('graphql.validate');
            return () => {
              validationSpan.end();
            };
          },
          async executionDidStart() {
            const executionSpan = context.tracer.startSpan('graphql.execute');
            return () => {
              executionSpan.end();
            };
          },
          async didEncounterErrors(errors) {
            context.requestSpan.recordException(errors);
          },
          async willSendResponse() {
            context.requestSpan.end();
          }
        };
      }
    }
  ]
});

Common OpenTelemetry GraphQL Problems and Solutions

Even with a solid setup, you're likely to run into issues. Here are solutions to common problems.

Preventing High Cardinality Issues in GraphQL Query Tracing

GraphQL queries can be virtually infinite in their variations, which can cause a cardinality explosion in your monitoring system.

Solution: Filter and normalize queries before recording them as span attributes:

function normalizeQuery(query) {
  // Replace literal values with placeholders
  return query
    .replace(/"[^"]*"/g, '"?"')
    .replace(/\d+/g, '?');
}

// Usage
span.setAttribute('graphql.query.normalized', normalizeQuery(query));

Implementing Distributed Tracing in Federated GraphQL Architectures

If you're using Apollo Federation or other GraphQL federation approaches, tracing across services gets complex.

Solution: Ensure proper context propagation:

// In your gateway service
const gateway = new ApolloGateway({
  serviceList: [
    /* your services */
  ],
  buildService({ name, url }) {
    return new RemoteGraphQLDataSource({
      url,
      willSendRequest({ request, context }) {
        // Extract and forward the trace context
        const currentSpanContext = context.activeSpan?.spanContext();
        if (currentSpanContext) {
          const traceParent = `00-${currentSpanContext.traceId}-${currentSpanContext.spanId}-0${currentSpanContext.traceFlags.toString(16)}`;
          request.http.headers.set('traceparent', traceParent);
        }
      }
    });
  }
});

Connecting GraphQL Resolver Spans with Database Operation Telemetry

Seeing GraphQL resolver times without a database query context isn't very helpful.

Solution: Link database spans with resolver spans:

async function getUsersFromDatabase(args) {
  const span = tracer.startSpan('db.query.users');
  span.setAttribute('db.statement', 'SELECT * FROM users LIMIT ? OFFSET ?');
  span.setAttribute('db.parameters', JSON.stringify([args.limit, args.offset]));
  
  try {
    const result = await db.query('SELECT * FROM users LIMIT ? OFFSET ?', [args.limit, args.offset]);
    return result;
  } finally {
    span.end();
  }
}
💡
Now, fix production GraphQL issues instantly—right from your IDE, with AI and Last9 MCP.

Analyzing OpenTelemetry GraphQL Data Effectively

Collecting data is only useful if you can make sense of it. Here's how to analyze your OpenTelemetry GraphQL data.

Essential GraphQL Performance Metrics for Operational Monitoring

Metric Description Why It Matters
Resolver Duration Time taken by each resolver Identifies slow resolvers
Parse/Validate Time Time spent in parsing and validation Can indicate complex queries
N+1 Query Count Number of duplicate database queries Common GraphQL performance issue
Resolver Error Rate Percentage of resolvers that throw errors Shows reliability issues
Query Complexity Calculated complexity score of queries Helps identify abuse or optimization opportunities

Designing GraphQL-Specific Observability Dashboards

A good GraphQL + OpenTelemetry dashboard should include:

  1. Top-level query response times
  2. Resolver timings by field
  3. Error rates by resolver
  4. Database query correlation
  5. Cache hit/miss rates
  6. Parsing and validation times
💡
If you're exploring how OpenTelemetry stacks up against traditional APM tools, this piece breaks it down with real-world context.

Integrating with Last9 for Advanced Observability

If you're looking for a budget-friendly managed observability solution that doesn’t compromise on features, Last9 pairs perfectly with your OpenTelemetry GraphQL setup.

We specialize in handling high-cardinality data — just like the data GraphQL generates — without the cost penalties you'd face with other vendors. Our pricing is based on event ingestion, so your costs stay predictable even as your GraphQL API usage grows.

Last9 integrates smoothly with your OpenTelemetry data and provides:

  • Correlation between GraphQL operations and the underlying infrastructure
  • Pre-built dashboards tailored for GraphQL workloads
  • Smart alerting that understands GraphQL context
  • A unified view across metrics, logs, and traces

Teams like Clevertap, Probo, and others trust Last9 for their OpenTelemetry needs, especially for how well we handle the high-cardinality nature of GraphQL telemetry data.

Last9 Review from JioStar
Last9 Review from JioStar

Best Practices for OpenTelemetry in GraphQL Production Environments

Moving to production requires some additional considerations:

Implementing Efficient Sampling for High-Volume GraphQL APIs

You likely don't need to trace every single GraphQL operation. Implement a smart sampling strategy:

const { ParentBasedSampler, TraceIdRatioBased } = require('@opentelemetry/sdk-trace-node');

// Sample 10% of traces by default
const rootSampler = new TraceIdRatioBased(0.1);

// For GraphQL operations, use parent-based sampling
const sampler = new ParentBasedSampler({
  root: rootSampler,
});

// Add to your SDK config
const sdk = new NodeSDK({
  sampler,
  // other config...
});

Optimizing OpenTelemetry Resource Consumption in GraphQL Services

OpenTelemetry adds overhead. Manage it with these tips:

  1. Be selective about which resolvers you instrument manually
  2. Use attribute limits to prevent memory bloat
  3. Consider batching span exports in high-throughput environments
  4. Implement circuit breakers to disable tracing if the system is under heavy load

Protecting Sensitive Data in GraphQL Telemetry Collection

GraphQL queries often contain sensitive data. Protect it:

  1. Always redact authentication tokens from headers
  2. Filter out sensitive fields from query variables
  3. Hash user identifiers before recording them as span attributes
  4. Consider field-level policies for what can be recorded in traces

Wrapping Up

Setting up OpenTelemetry with GraphQL gives you x-ray vision into your API's performance and behavior. Remember that the goal isn't just to collect data – it's to make your GraphQL API more reliable, performant, and maintainable. Let the telemetry guide your optimization efforts and architecture decisions.

💡
And if you'd like to keep the conversation going, our Discord community is open. We have a dedicated channel where you can chat with other developers about your specific use case.

FAQs

How much overhead does OpenTelemetry add to my GraphQL server?

When properly configured, OpenTelemetry typically adds 3-5% overhead in terms of latency and CPU usage. You can reduce this by implementing sampling (tracing only a percentage of requests) or by selectively instrumenting only critical paths.

Can OpenTelemetry help identify N+1 query problems in GraphQL?

Yes! This is one of the biggest benefits. By correlating database spans with resolver spans, you can easily spot when a resolver is triggering multiple similar database queries that could be batched. Tools like DataLoader become much easier to implement effectively when you can see the N+1 problems.

How do I handle sensitive data in GraphQL queries when using OpenTelemetry?

Implement a sanitization layer that processes GraphQL queries and variables before they're attached to spans. You can write middleware that redacts sensitive fields (like passwords or personal information) before they're recorded in your telemetry data.

Is OpenTelemetry suitable for both monolithic and federated GraphQL architectures?

Absolutely. For monoliths, the setup is simpler but still valuable. For federated architectures, OpenTelemetry shines as it can trace requests across service boundaries, giving you end-to-end visibility that's otherwise very difficult to achieve.

How does batching affect OpenTelemetry tracing in GraphQL?

When using batching techniques like DataLoader, you'll want to ensure your custom spans correctly represent the batched nature of operations. This usually means creating spans for both individual resolver calls and the batched data loading operations, with proper parent-child relationships between them.

Contents


Newsletter

Stay updated on the latest from Last9.

Authors
Anjali Udasi

Anjali Udasi

Helping to make the tech a little less intimidating. I love breaking down complex concepts into easy-to-understand terms.