Getting Started with OpenTelemetry Custom Metrics

"Our API latency spiked to 800ms, but why?"

"Is that new feature actually driving conversions, or just burning CPU?"

"Which microservices are becoming bottlenecks during peak hours?"

Standard monitoring tools can leave these critical questions unanswered. That's where OpenTelemetry custom metrics help—they let you measure exactly what matters to your specific application and business.

This guide cuts through the complexity of OpenTelemetry to show you how to build metrics that speak your language.

What Are OpenTelemetry Custom Metrics?

OpenTelemetry custom metrics are user-defined measurements that track specific aspects of your application's performance and behavior that aren't covered by default metrics.

While standard metrics give you visibility into common system parameters, custom metrics let you monitor the specific things that matter to your business and application.

Custom metrics in OpenTelemetry follow the same structure as standard metrics but allow you to define:

Your own metric names
Application-specific measurements
Business-relevant data points
Internal process statistics

Think of them as your observability Swiss Army knife—adaptable to whatever monitoring challenge you're facing.

💡

To understand where OpenTelemetry fits in the world of APM tools, this piece lays out the key differences and what that means for your setup: OpenTelemetry and APM.

Why Custom Metrics Matter for DevOps Teams

For DevOps engineers, custom metrics aren't just another tool in the toolbox—they're often the difference between reactive firefighting and proactive problem solving.

Custom metrics give you:

Business context: Connect technical performance to actual business outcomes
Application-specific insights: Monitor the unique aspects of your stack
Early warning signals: Create custom indicators that catch issues before they become outages
Validation of changes: Measure the impact of deployments and infrastructure modifications

A well-designed custom metric strategy helps bridge the gap between "the system is slow" and "the payment processing service is experiencing 2x normal latency when handling transactions over $1000."

Setting Up Your OpenTelemetry Environment

Before diving into custom metrics, you need a functioning OpenTelemetry pipeline. Here's a quick setup guide:

Step 1: Install the OpenTelemetry SDK

For a Node.js application, this looks like:

npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/sdk-metrics @opentelemetry/exporter-metrics-otlp-http

For Python:

pip install opentelemetry-sdk opentelemetry-api opentelemetry-exporter-otlp-proto-http

Step 2: Configure the OpenTelemetry Exporter

You'll need an exporter to send your telemetry data to an observability backend. Last9 offers a streamlined integration here—more on that later.

// Node.js example for metrics exporter
const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-http');

const metricExporter = new OTLPMetricExporter({
  url: 'https://your-collector-endpoint/v1/metrics',
  headers: {}, // Add any required authentication headers
  concurrencyLimit: 10, // Adjust based on your throughput needs
});

Step 3: Set Up a Metrics Provider

const { MeterProvider, PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics');

// Create a metric reader for batch processing
const metricReader = new PeriodicExportingMetricReader({
  exporter: metricExporter,
  exportIntervalMillis: 15000, // Export metrics every 15 seconds
});

// Configure resource attributes for better context
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');

const resource = Resource.default().merge(
  new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: 'my-service',
    [SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
    [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: 'production',
  })
);

// Create and register the meter provider
const meterProvider = new MeterProvider({
  resource: resource,
  readers: [metricReader],
});

// Register as global meter provider
meterProvider.register();

// Get a meter to create metrics
const meter = meterProvider.getMeter('my-service-metrics');

Step 4: Set Up the OpenTelemetry Collector (Optional)

For production environments, you'll often want to use the OpenTelemetry Collector as an intermediary:

Download the OpenTelemetry Collector binary from GitHub
Create a configuration file:

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
    timeout: 10s
    send_batch_size: 1024

exporters:
  otlp:
    endpoint: "https://your-backend.last9.io"
    headers:
      "api-key": "your-api-key"
  
  prometheusremotewrite:
    endpoint: "https://prometheus.example.com/api/v1/write"
    tls:
      insecure: false

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp, prometheusremotewrite]

Run the collector with this configuration

With this foundation in place, you're ready to start creating custom metrics.

💡

If you're working with OpenTelemetry, this post on monitoring the Otel Collector covers what to watch out for and how to keep it healthy: Otel Collector Monitoring.

Creating Your First Custom Metrics

OpenTelemetry supports several types of custom metrics. Let's look at how to implement each one:

Counter Metrics

Counters track values that only increase over time, like request counts or error tallies.

// Create a meter provider
const { MeterProvider } = require('@opentelemetry/sdk-metrics');
const meterProvider = new MeterProvider();

// Get a meter
const meter = meterProvider.getMeter('my-service');

// Create a counter
const requestCounter = meter.createCounter('requests_total', {
  description: 'Total number of requests',
});

// Increment the counter
requestCounter.add(1, { 'endpoint': '/api/users', 'method': 'GET' });

Gauge Metrics

Gauges measure values that can go up or down, such as memory usage or concurrent connections.

// Create a gauge
const activeUsersGauge = meter.createUpDownCounter('active_users', {
  description: 'Number of active users',
});

// Update the gauge value
activeUsersGauge.add(1, { 'region': 'us-west' }); // User logged in
activeUsersGauge.add(-1, { 'region': 'us-west' }); // User logged out

Histogram Metrics

Histograms track the distribution of values, perfect for monitoring response times.

// Create a histogram
const responseTimeHistogram = meter.createHistogram('response_time_seconds', {
  description: 'Response time in seconds',
});

// Record a measurement
responseTimeHistogram.record(0.327, { 'endpoint': '/api/products' });

Advanced Custom Metrics Strategies

Once you've mastered the basics, you can implement more sophisticated metrics strategies:

Using Views to Transform Metrics

OpenTelemetry views let you customize how metrics are collected and aggregated before export:

const { View, MeterProvider } = require('@opentelemetry/sdk-metrics');

// Create a view that changes aggregation to explicit bucket boundaries
const responseTimeView = new View({
  instrumentName: 'http.server.duration',
  instrumentType: 'histogram',
  meterName: 'my-service-metrics', 
  description: 'A custom view of response time with specific buckets',
  aggregation: {
    type: 'histogram',
    boundaries: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
  }
});

// Apply the view when creating the meter provider
const meterProvider = new MeterProvider({
  resource: resource,
  readers: [metricReader],
  views: [responseTimeView]
});

💡

This guide walks through how distributed tracing works with OpenTelemetry and why it matters for understanding your systems: Distributed Tracing with OpenTelemetry.

Creating SLI/SLO Metrics

Service Level Indicators (SLIs) and Service Level Objectives (SLOs) help you track reliability. Here's how to create custom metrics for them:

// Define an error rate SLI
const requestCounter = meter.createCounter('requests_total', {
  description: 'Total number of requests',
  unit: '1'
});
const errorCounter = meter.createCounter('errors_total', {
  description: 'Total number of failed requests',
  unit: '1'
});

// When handling requests
function handleRequest() {
  const labels = { 'service': 'payment-processor' };
  requestCounter.add(1, labels);
  
  try {
    // Process request
  } catch (error) {
    errorCounter.add(1, labels);
  }
}

// Your monitoring system can then calculate error rate as:
// error_rate = errors_total / requests_total

Business Metrics

Connect technical monitoring to business outcomes with custom business metrics:

const orderValueHistogram = meter.createHistogram('order_value_dollars', {
  description: 'Distribution of order values',
  unit: 'USD'
});

function processOrder(orderValue, productCategory) {
  orderValueHistogram.record(orderValue, { 
    'category': productCategory,
    'channel': 'mobile-app'
  });
  
  // Process the order...
}

Metric Aggregation Strategies

Different metrics benefit from different aggregation methods:

// For a histogram with custom buckets to track API response time distribution
const apiLatencyHistogram = meter.createHistogram('api_response_seconds', {
  description: 'API response time in seconds',
  unit: 's',
  boundaries: [0.01, 0.05, 0.1, 0.5, 1, 5, 10] // Custom bucket boundaries
});

// For tracking multiple dimensions with attribute filtering
const databaseOperations = meter.createCounter('db_operations_total', {
  description: 'Database operations count',
  unit: '1'
});

// Record with multiple attributes for later analysis
databaseOperations.add(1, {
  'operation': 'query',
  'table': 'users',
  'status': 'success',
  'region': 'us-west2'
});

Integrating Custom Metrics with Observability Platforms

Your custom metrics are most valuable when they're integrated with a robust observability platform. When choosing a solution, Last9 stands out for its efficient approach to telemetry data.

We focus on event-based pricing, making costs predictable even as your metrics collection grows. It's built to handle high-cardinality data—a common challenge when working with custom metrics that have many different label combinations.

Our platform integrates with OpenTelemetry and Prometheus effortlessly, bringing together metrics, logs, and traces in a unified view. This correlation capability is especially powerful for troubleshooting complex issues where custom metrics can provide the context you need.

Other tools worth considering include:

Grafana Mimir for metrics
Elastic for log management

Each has its strengths, but few offer the integrated approach and budget-friendly pricing model of Last9.

💡

Now, fix production OpenTelemetry custom metrics issues instantly—right from your IDE, with AI and Last9 MCP.

Best Practices for OpenTelemetry Custom Metrics

To get the most value from your custom metrics while avoiding common pitfalls:

1. Be Strategic About Cardinality

High cardinality—having metrics with many unique label combinations—can quickly become expensive and slow down your monitoring system.

// Bad: Unbounded cardinality
requestCounter.add(1, { 'user_id': userId }); // Could create millions of series

// Good: Limited cardinality
requestCounter.add(1, { 'user_type': userType }); // Limited to a few user types

2. Create Consistent Naming Conventions

Establish a clear naming pattern for your metrics to make them easier to discover and use:

[component]_[measure]_[unit]

Examples:

api_request_duration_seconds
database_connection_count
queue_message_count

3. Document Your Custom Metrics

Maintain a registry of custom metrics with descriptions, expected ranges, and what actions to take when thresholds are crossed.

Metric Name	Description	Labels	Normal Range	Alert Threshold
`api_error_rate`	Percentage of API requests resulting in errors	endpoint, method	0-2%	>5% for 5min
`order_processing_time`	Time to process an order end-to-end	order_type	1-5s	>10s for any order

4. Use Labels Effectively

Labels (or tags) add dimensions to your metrics, making them more useful for filtering and grouping:

// Add relevant context with labels
responseTimeHistogram.record(0.042, {
  'service': 'payment',
  'endpoint': '/api/charge',
  'customer_tier': 'premium',
  'region': 'eu-central'
});

Troubleshooting Custom Metrics Common Issues

Even experienced DevOps engineers can run into challenges with custom metrics. Here are solutions to common problems:

Metric Data Not Appearing

If your custom metrics aren't showing up:

Verify the exporter configuration is correct
Check that you're calling .add() or .record() methods
Ensure your collector is properly configured to receive metrics
Look for any network issues between your application and collector

High Cardinality Problems

If your metrics system is slowing down or costs are spiraling:

Review metric labels for high-cardinality dimensions
Consider sampling for high-volume metrics
Aggregate data at the application level when possible
Use bucketing for continuous values (like response times)

💡

Want to understand why high cardinality matters and how it affects your systems? This post breaks it down clearly: What is High Cardinality?

Inconsistent Metric Values

If metrics seem inconsistent:

Check for reset conditions in your application
Verify that metrics are being initialized properly on startup
Look for clock synchronization issues if comparing across services
Ensure all instances are properly instrumented

Conclusion

OpenTelemetry custom metrics let you track exactly what matters for your app and business. Start simple, and as you go, you can add more advanced metrics to build an observability setup that improves reliability and helps you fix issues faster.

💡

To continue the conversation, join our Discord Community to connect with other DevOps professionals and share your custom metrics challenges and success stories.

FAQs

Q: How many custom metrics should I create?

A: Start with metrics that directly tie to user experience and business outcomes. Usually 10-15 well-chosen metrics provide better insight than 100 poorly conceived ones.

Q: Can custom metrics impact application performance?

A: Yes, but the impact is usually minimal. If you're concerned, start with a small set of high-value metrics and use sampling for high-volume data points.

Q: How do OpenTelemetry custom metrics compare to Prometheus custom metrics?

A: OpenTelemetry metrics are vendor-neutral and can be exported to multiple backends, including Prometheus. The main difference is in collection architecture and metadata handling.

Q: Should I use custom metrics or logs for tracking business events?

A: For countable events that you'll want to aggregate and alert on, metrics are usually better. For detailed context around specific occurrences, logs are more appropriate.

Q: How do I migrate existing metrics to OpenTelemetry?

A: OpenTelemetry provides compatibility layers for many existing systems. Start by implementing dual reporting, then gradually transition to OpenTelemetry as the source of truth.

Q: What's the difference between metrics and traces in OpenTelemetry?

A: Metrics are aggregated numerical measurements over time, while traces track the journey of requests through your system. Both are complementary: metrics show what's happening, traces help you understand why.

Q: How do I handle metric cardinality issues at scale?

A: Use Last9's high-cardinality handling capabilities, limit the number of label combinations, use sampling for high-volume metrics, and consider using metric views to pre-aggregate data.