"Our API latency spiked to 800ms, but why?"
"Is that new feature actually driving conversions, or just burning CPU?"
"Which microservices are becoming bottlenecks during peak hours?"
Standard monitoring tools can leave these critical questions unanswered. That's where OpenTelemetry custom metrics help—they let you measure exactly what matters to your specific application and business.
This guide cuts through the complexity of OpenTelemetry to show you how to build metrics that speak your language.
What Are OpenTelemetry Custom Metrics?
OpenTelemetry custom metrics are user-defined measurements that track specific aspects of your application's performance and behavior that aren't covered by default metrics.
While standard metrics give you visibility into common system parameters, custom metrics let you monitor the specific things that matter to your business and application.
Custom metrics in OpenTelemetry follow the same structure as standard metrics but allow you to define:
- Your own metric names
- Application-specific measurements
- Business-relevant data points
- Internal process statistics
Think of them as your observability Swiss Army knife—adaptable to whatever monitoring challenge you're facing.
Why Custom Metrics Matter for DevOps Teams
For DevOps engineers, custom metrics aren't just another tool in the toolbox—they're often the difference between reactive firefighting and proactive problem solving.
Custom metrics give you:
- Business context: Connect technical performance to actual business outcomes
- Application-specific insights: Monitor the unique aspects of your stack
- Early warning signals: Create custom indicators that catch issues before they become outages
- Validation of changes: Measure the impact of deployments and infrastructure modifications
A well-designed custom metric strategy helps bridge the gap between "the system is slow" and "the payment processing service is experiencing 2x normal latency when handling transactions over $1000."
Setting Up Your OpenTelemetry Environment
Before diving into custom metrics, you need a functioning OpenTelemetry pipeline. Here's a quick setup guide:
Step 1: Install the OpenTelemetry SDK
For a Node.js application, this looks like:
npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/sdk-metrics @opentelemetry/exporter-metrics-otlp-http
For Python:
pip install opentelemetry-sdk opentelemetry-api opentelemetry-exporter-otlp-proto-http
Step 2: Configure the OpenTelemetry Exporter
You'll need an exporter to send your telemetry data to an observability backend. Last9 offers a streamlined integration here—more on that later.
// Node.js example for metrics exporter
const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-http');
const metricExporter = new OTLPMetricExporter({
url: 'https://your-collector-endpoint/v1/metrics',
headers: {}, // Add any required authentication headers
concurrencyLimit: 10, // Adjust based on your throughput needs
});
Step 3: Set Up a Metrics Provider
const { MeterProvider, PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics');
// Create a metric reader for batch processing
const metricReader = new PeriodicExportingMetricReader({
exporter: metricExporter,
exportIntervalMillis: 15000, // Export metrics every 15 seconds
});
// Configure resource attributes for better context
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');
const resource = Resource.default().merge(
new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: 'my-service',
[SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
[SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: 'production',
})
);
// Create and register the meter provider
const meterProvider = new MeterProvider({
resource: resource,
readers: [metricReader],
});
// Register as global meter provider
meterProvider.register();
// Get a meter to create metrics
const meter = meterProvider.getMeter('my-service-metrics');
Step 4: Set Up the OpenTelemetry Collector (Optional)
For production environments, you'll often want to use the OpenTelemetry Collector as an intermediary:
- Download the OpenTelemetry Collector binary from GitHub
- Create a configuration file:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
timeout: 10s
send_batch_size: 1024
exporters:
otlp:
endpoint: "https://your-backend.last9.io"
headers:
"api-key": "your-api-key"
prometheusremotewrite:
endpoint: "https://prometheus.example.com/api/v1/write"
tls:
insecure: false
service:
pipelines:
metrics:
receivers: [otlp]
processors: [batch]
exporters: [otlp, prometheusremotewrite]
- Run the collector with this configuration
With this foundation in place, you're ready to start creating custom metrics.
Creating Your First Custom Metrics
OpenTelemetry supports several types of custom metrics. Let's look at how to implement each one:
Counter Metrics
Counters track values that only increase over time, like request counts or error tallies.
// Create a meter provider
const { MeterProvider } = require('@opentelemetry/sdk-metrics');
const meterProvider = new MeterProvider();
// Get a meter
const meter = meterProvider.getMeter('my-service');
// Create a counter
const requestCounter = meter.createCounter('requests_total', {
description: 'Total number of requests',
});
// Increment the counter
requestCounter.add(1, { 'endpoint': '/api/users', 'method': 'GET' });
Gauge Metrics
Gauges measure values that can go up or down, such as memory usage or concurrent connections.
// Create a gauge
const activeUsersGauge = meter.createUpDownCounter('active_users', {
description: 'Number of active users',
});
// Update the gauge value
activeUsersGauge.add(1, { 'region': 'us-west' }); // User logged in
activeUsersGauge.add(-1, { 'region': 'us-west' }); // User logged out
Histogram Metrics
Histograms track the distribution of values, perfect for monitoring response times.
// Create a histogram
const responseTimeHistogram = meter.createHistogram('response_time_seconds', {
description: 'Response time in seconds',
});
// Record a measurement
responseTimeHistogram.record(0.327, { 'endpoint': '/api/products' });
Advanced Custom Metrics Strategies
Once you've mastered the basics, you can implement more sophisticated metrics strategies:
Using Views to Transform Metrics
OpenTelemetry views let you customize how metrics are collected and aggregated before export:
const { View, MeterProvider } = require('@opentelemetry/sdk-metrics');
// Create a view that changes aggregation to explicit bucket boundaries
const responseTimeView = new View({
instrumentName: 'http.server.duration',
instrumentType: 'histogram',
meterName: 'my-service-metrics',
description: 'A custom view of response time with specific buckets',
aggregation: {
type: 'histogram',
boundaries: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
}
});
// Apply the view when creating the meter provider
const meterProvider = new MeterProvider({
resource: resource,
readers: [metricReader],
views: [responseTimeView]
});
Creating SLI/SLO Metrics
Service Level Indicators (SLIs) and Service Level Objectives (SLOs) help you track reliability. Here's how to create custom metrics for them:
// Define an error rate SLI
const requestCounter = meter.createCounter('requests_total', {
description: 'Total number of requests',
unit: '1'
});
const errorCounter = meter.createCounter('errors_total', {
description: 'Total number of failed requests',
unit: '1'
});
// When handling requests
function handleRequest() {
const labels = { 'service': 'payment-processor' };
requestCounter.add(1, labels);
try {
// Process request
} catch (error) {
errorCounter.add(1, labels);
}
}
// Your monitoring system can then calculate error rate as:
// error_rate = errors_total / requests_total
Business Metrics
Connect technical monitoring to business outcomes with custom business metrics:
const orderValueHistogram = meter.createHistogram('order_value_dollars', {
description: 'Distribution of order values',
unit: 'USD'
});
function processOrder(orderValue, productCategory) {
orderValueHistogram.record(orderValue, {
'category': productCategory,
'channel': 'mobile-app'
});
// Process the order...
}
Metric Aggregation Strategies
Different metrics benefit from different aggregation methods:
// For a histogram with custom buckets to track API response time distribution
const apiLatencyHistogram = meter.createHistogram('api_response_seconds', {
description: 'API response time in seconds',
unit: 's',
boundaries: [0.01, 0.05, 0.1, 0.5, 1, 5, 10] // Custom bucket boundaries
});
// For tracking multiple dimensions with attribute filtering
const databaseOperations = meter.createCounter('db_operations_total', {
description: 'Database operations count',
unit: '1'
});
// Record with multiple attributes for later analysis
databaseOperations.add(1, {
'operation': 'query',
'table': 'users',
'status': 'success',
'region': 'us-west2'
});
Integrating Custom Metrics with Observability Platforms
Your custom metrics are most valuable when they're integrated with a robust observability platform. When choosing a solution, Last9 stands out for its efficient approach to telemetry data.
We focus on event-based pricing, making costs predictable even as your metrics collection grows. It's built to handle high-cardinality data—a common challenge when working with custom metrics that have many different label combinations.
Our platform integrates with OpenTelemetry and Prometheus effortlessly, bringing together metrics, logs, and traces in a unified view. This correlation capability is especially powerful for troubleshooting complex issues where custom metrics can provide the context you need.
Other tools worth considering include:
- Grafana Mimir for metrics
- Elastic for log management
Each has its strengths, but few offer the integrated approach and budget-friendly pricing model of Last9.
Best Practices for OpenTelemetry Custom Metrics
To get the most value from your custom metrics while avoiding common pitfalls:
1. Be Strategic About Cardinality
High cardinality—having metrics with many unique label combinations—can quickly become expensive and slow down your monitoring system.
// Bad: Unbounded cardinality
requestCounter.add(1, { 'user_id': userId }); // Could create millions of series
// Good: Limited cardinality
requestCounter.add(1, { 'user_type': userType }); // Limited to a few user types
2. Create Consistent Naming Conventions
Establish a clear naming pattern for your metrics to make them easier to discover and use:
[component]_[measure]_[unit]
Examples:
api_request_duration_seconds
database_connection_count
queue_message_count
3. Document Your Custom Metrics
Maintain a registry of custom metrics with descriptions, expected ranges, and what actions to take when thresholds are crossed.
Metric Name | Description | Labels | Normal Range | Alert Threshold |
---|---|---|---|---|
api_error_rate |
Percentage of API requests resulting in errors | endpoint, method | 0-2% | >5% for 5min |
order_processing_time |
Time to process an order end-to-end | order_type | 1-5s | >10s for any order |
4. Use Labels Effectively
Labels (or tags) add dimensions to your metrics, making them more useful for filtering and grouping:
// Add relevant context with labels
responseTimeHistogram.record(0.042, {
'service': 'payment',
'endpoint': '/api/charge',
'customer_tier': 'premium',
'region': 'eu-central'
});
Troubleshooting Custom Metrics Common Issues
Even experienced DevOps engineers can run into challenges with custom metrics. Here are solutions to common problems:
Metric Data Not Appearing
If your custom metrics aren't showing up:
- Verify the exporter configuration is correct
- Check that you're calling
.add()
or.record()
methods - Ensure your collector is properly configured to receive metrics
- Look for any network issues between your application and collector
High Cardinality Problems
If your metrics system is slowing down or costs are spiraling:
- Review metric labels for high-cardinality dimensions
- Consider sampling for high-volume metrics
- Aggregate data at the application level when possible
- Use bucketing for continuous values (like response times)
Inconsistent Metric Values
If metrics seem inconsistent:
- Check for reset conditions in your application
- Verify that metrics are being initialized properly on startup
- Look for clock synchronization issues if comparing across services
- Ensure all instances are properly instrumented
Conclusion
OpenTelemetry custom metrics let you track exactly what matters for your app and business. Start simple, and as you go, you can add more advanced metrics to build an observability setup that improves reliability and helps you fix issues faster.
FAQs
Q: How many custom metrics should I create?
A: Start with metrics that directly tie to user experience and business outcomes. Usually 10-15 well-chosen metrics provide better insight than 100 poorly conceived ones.
Q: Can custom metrics impact application performance?
A: Yes, but the impact is usually minimal. If you're concerned, start with a small set of high-value metrics and use sampling for high-volume data points.
Q: How do OpenTelemetry custom metrics compare to Prometheus custom metrics?
A: OpenTelemetry metrics are vendor-neutral and can be exported to multiple backends, including Prometheus. The main difference is in collection architecture and metadata handling.
Q: Should I use custom metrics or logs for tracking business events?
A: For countable events that you'll want to aggregate and alert on, metrics are usually better. For detailed context around specific occurrences, logs are more appropriate.
Q: How do I migrate existing metrics to OpenTelemetry?
A: OpenTelemetry provides compatibility layers for many existing systems. Start by implementing dual reporting, then gradually transition to OpenTelemetry as the source of truth.
Q: What's the difference between metrics and traces in OpenTelemetry?
A: Metrics are aggregated numerical measurements over time, while traces track the journey of requests through your system. Both are complementary: metrics show what's happening, traces help you understand why.
Q: How do I handle metric cardinality issues at scale?
A: Use Last9's high-cardinality handling capabilities, limit the number of label combinations, use sampling for high-volume metrics, and consider using metric views to pre-aggregate data.