Effective Node.js monitoring requires tracking runtime metrics (memory, CPU), application metrics (request rates, response times), and business metrics (user actions, conversion rates). This guide covers what to track, how to collect it, and how to set up meaningful alerts.
Why Do Node.js Metrics Matter?
You've built a Node.js application and deployed it to production. Without proper metrics, troubleshooting becomes difficult when users report that "the app feels slow."
Good metrics transform vague complaints into actionable data points like "the payment service is experiencing 500ms response times, up from a 120ms baseline."
What Runtime Metrics Should You Track?
Runtime metrics show how Node.js itself is performing. They provide early warning signs of problems.
Monitor Memory Usage
Node.js memory management and garbage collection can be tricky. Watch these metrics to identify memory issues:
- Heap Used vs Heap Total: When used memory grows without returning to baseline after garbage collection, you may have a memory leak.
- External Memory: Memory used by C++ objects bound to JavaScript objects.
- Garbage Collection Frequency: Frequent garbage collection can reduce performance.
- RSS (Resident Set Size): Total memory allocated for the Node process in RAM.
- Full GC Per Min: Number of full garbage collection cycles per minute.
- Incremental GC Per Min: Number of incremental garbage collection cycles per minute.
- Heap Size Changed: Percentage of memory reclaimed by garbage collection cycles.
- GC Duration: Time spent in garbage collection (longer durations impact performance).
// Basic way to log memory usage in your app
const logMemoryUsage = () => {
const memUsage = process.memoryUsage();
console.log({
rss: `${Math.round(memUsage.rss / 1024 / 1024)} MB`,
heapTotal: `${Math.round(memUsage.heapTotal / 1024 / 1024)} MB`,
heapUsed: `${Math.round(memUsage.heapUsed / 1024 / 1024)} MB`,
external: `${Math.round(memUsage.external / 1024 / 1024)} MB`
});
};
Measure CPU Utilization
Node.js is single-threaded by default. CPU metrics help you understand resource usage:
- CPU Usage Percentage: How much CPU your Node process is using.
- Event Loop Lag: The delay between when a task should run and when it runs.
- Active Handles: Count of active handles (sockets, timers, etc.) – high numbers can indicate resource leaks.
Analyze Event Loop Metrics
The event loop is central to Node.js performance. These metrics help monitor its health:
- Event Loop Lag: The Time it takes for the event loop to process callbacks.
- Event Loop Utilization: Fraction of time the event loop is running code vs idle.
- Average Tick Length: Average amount of time between event loop ticks.
- Maximum Tick Length: Longest amount of time between event loop ticks (indicates blocking operations).
- Minimum Tick Length: Shortest amount of time between ticks.
- Tick Count: Number of times the event loop has ticked.
- Average IO Time: Average milliseconds per event loop tick spent processing IO callbacks.
Metric | Warning Threshold | Critical Threshold | What It Means |
---|---|---|---|
Event Loop Lag | > 100ms | > 500ms | The application is struggling to process callbacks quickly enough |
CPU Usage | > 70% sustained | > 90% sustained | Approaching CPU limits |
Memory Growth | Steady increase over hours | Near heap limit | Possible memory leak |
How Can Application Metrics Improve Performance?
Runtime metrics show how Node.js is performing. Application metrics reveal how your code is performing.
Track HTTP Request Metrics
For web applications and APIs, monitor:
- Request Rate: Requests per second, broken down by endpoint.
- Response Time: P95/P99 latencies (not just averages).
- Error Rate: Percentage of requests resulting in errors (4xx/5xx).
- Concurrent Connections: Number of active connections.
Monitor Database and External Service Performance
Your app connects to other services:
- Query Execution Time: How long database operations take.
- Connection Pool Utilization: Current vs maximum allowed connections.
- External API Response Times: How quickly third-party services respond.
- Failed Transactions: Rate of database or API call failures.
- KB Read/Written Per Second: Rate of disk operations.
- Network I/O: KB/sec received and sent.
Last9 helps correlate these metrics across services to identify how database slowdowns affect your Node.js application's performance.
How Do Metrics Connect to Business Value?
To demonstrate value to non-technical stakeholders, track business metrics:
- Conversion Rate: How technical performance affects sales.
- User Engagement: Session duration, pages per visit.
- Cart Abandonment: Often correlates with slow checkout API responses.
- Revenue Impact: Financial impact during degraded performance periods.
How Can You Implement Metrics Collection
Here's how to start collecting metrics from your Node.js app:
Utilise Built-in Node.js APIs
Node.js has built-in tools for basic metrics:
// health-check.js
const os = require('os');
app.get('/health', (req, res) => {
res.json({
uptime: process.uptime(),
memory: process.memoryUsage(),
cpu: {
load: os.loadavg(),
cores: os.cpus().length
}
});
});
Implement an Observability Client
For production applications, use a more comprehensive solution. Last9's OpenTelemetry-compatible client works well:
// Example using OpenTelemetry with Last9
const { NodeTracerProvider } = require('@opentelemetry/node');
const { ConsoleSpanExporter } = require('@opentelemetry/tracing');
const { CollectorTraceExporter } = require('@opentelemetry/exporter-collector');
const provider = new NodeTracerProvider();
const exporter = new CollectorTraceExporter({
url: 'https://ingest.last9.io',
headers: {
'x-api-key': 'YOUR_API_KEY'
}
});
provider.addSpanProcessor(
new SimpleSpanProcessor(exporter)
);
provider.register();
// Auto-instrumentation for Express, MongoDB, etc.
require('@opentelemetry/auto-instrumentations-node').registerInstrumentations({
tracerProvider: provider
});
This setup captures HTTP requests, database calls, and external service interactions automatically.
What Custom Metrics Should You Consider?
Generic metrics provide a foundation, but every app has unique aspects worth tracking:
Capture User Experience Metrics
// Track time spent on checkout process
app.post('/api/checkout', (req, res) => {
const startTime = Date.now();
processOrder(req.body)
.then(result => {
// Record checkout time
metrics.timing('checkout.time', Date.now() - startTime);
metrics.increment('checkout.success');
res.json(result);
})
.catch(err => {
metrics.increment('checkout.error');
res.status(500).json({ error: err.message });
});
});
Measure Cache Effectiveness
// Track cache hit/miss ratio
function getCachedData(key) {
return cache.get(key)
.then(data => {
if (data) {
metrics.increment('cache.hit');
return data;
}
metrics.increment('cache.miss');
return fetchAndCacheData(key);
});
}
How Should You Configure Effective Alerts?
The goal of monitoring is to know when something's wrong. Last9 helps set up intelligent alerts:
Establish Multi-Threshold Alerting
Use graduated alert levels rather than binary good/bad:
- Warning: "API latency above 200ms for 5 minutes"
- Error: "API latency above 500ms for 3 minutes"
- Critical: "API error rate above 5% for 1 minute"
Design Correlation-based Alerts
Alert on patterns instead of single thresholds:
- "Database connection time increased AND API latency increased" suggests a database issue affecting your app.
- "CPU spiked but throughput didn't change" might indicate an inefficient background process.
Implement Anomaly Detection
Especially useful for Node.js applications with variable load:
- Set dynamic thresholds based on time of day
- Alert on sudden changes rather than fixed values
- Detect when metrics deviate from historical patterns
Last9's anomaly detection can learn your app's normal behavior patterns, reducing false alarms while catching real issues.
What Common Mistakes Should You Avoid?
Avoid these common monitoring mistakes:
Interpret Memory "Sawtooth" Patterns Correctly
Node.js garbage collection creates a sawtooth pattern in memory usage. This is normal! Look for trends in the peaks, not the dips.
Correlate Event Loop Metrics With Performance
A busy event loop doesn't always mean trouble. Correlate with response times and error rates before optimizing.
Choose Percentiles Over Mean Values
Average response time can hide problems. A mean of 100ms could mean all requests take 100ms, or 95% take 50ms while 5% take 1000ms. Always look at p95/p99 values.
Monitor Socket.IO Communication
For applications using Socket.IO, also monitor:
- Number of current connections
- Total connections since startup
- Number and size of messages exchanged
When Should You Use Metrics Outside Production?
Metrics aren't just for live systems:
Benchmark New Features
Compare metrics before and after new code to identify performance changes early.
Perform Load Testing with Metrics
Correlate load test results with internal metrics to find bottlenecks before they reach production.
Compare A/B Testing Performance
Use metrics to compare different implementations of the same feature.
How Do You Get Started With Node.js Metrics?
Here's your action plan for effective Node.js monitoring:
- Configure basic runtime metrics (memory, CPU, event loop)
- Implement application-level metrics (HTTP, database, external services)
- Define business metrics that connect performance to outcomes
- Set up smart, multi-threshold alerts
- Create dashboards for different stakeholders
- Regularly review and refine your metrics based on actual incidents
Wrapping Up
There’s no one-size-fits-all approach to monitoring Node.js, but knowing which metrics reflect real issues in your app is a good start. Focus on what helps you debug faster and make informed decisions—everything else is noise.
FAQs
How does Node.js's garbage collection affect my metrics?
Garbage collection causes periodic pauses in execution, visible as spikes in latency metrics and a sawtooth pattern in memory usage. These are normal, but excessive GC can indicate memory problems. Monitor both GC frequency and duration.
What's the overhead of collecting these metrics?
Modern monitoring solutions add minimal overhead (typically <1% CPU). Start with the essential metrics, then expand. Last9's agent is designed for minimal impact while providing good visibility.
Should I instrument all my API endpoints?
Start with your most critical paths. For an e-commerce site, that might be product browsing, cart, and checkout flows. Add more based on customer impact.
How do I track Node.js metrics in a microservices architecture?
Correlation is key. Use consistent tracing IDs across services and a platform that can connect related metrics. Last9 helps with this, letting you trace requests across your entire system.
How many metrics should I track?
Focus on actionable metrics. If a metric doesn't help you decide what action to take, it's probably not worth tracking. Quality over quantity.
What's the difference between logs, metrics, and traces?
- Logs are records of discrete events ("User X logged in")
- Metrics are numeric measurements over time ("5 logins per minute")
- Traces follow operations across multiple services ("User login → Auth service → Database")
How do I correlate Node.js metrics with user experience?
Implement Real User Monitoring (RUM) alongside your backend metrics. This captures actual user experiences and lets you correlate backend performance with frontend metrics like page load time.
What's a good alerting strategy for Node.js applications?
Multi-level alerting based on impact:
- Info: Anomalies worth noting but not urgent
- Warning: Issues that need attention soon
- Critical: Problems affecting users right now
Route different levels to appropriate channels (email for warnings, PagerDuty for critical alerts).