When your MongoDB database slows down, it affects your entire application stack. Performance issues can range from minor inconveniences to major outages, making a solid understanding of MongoDB metrics essential for any DevOps engineer.
This guide covers the key performance metrics you need to monitor in MongoDB, how to interpret what you're seeing, and practical steps to resolve common issues.
MongoDB Architecture for Better Monitoring
Before jumping into metrics, let's quickly review MongoDB's architecture to better understand what we're monitoring. MongoDB uses a document-oriented model where data is stored in flexible JSON-like documents. The core components include:
- mongod: The primary database process that handles data requests and manages data files
- WiredTiger: The default storage engine since MongoDB 3.2, managing how data is stored on disk
- Collections: Similar to tables in relational databases, collections store related documents
- Replica Sets: Groups of mongod instances that maintain identical data copies for redundancy
- Sharding: The method MongoDB uses to horizontally partition data across multiple servers
Each component generates specific metrics that provide insights into database health and performance.
What Makes MongoDB Performance Metrics Different?
MongoDB's document-oriented structure means its performance profile differs significantly from traditional relational databases. Instead of tables and rows, MongoDB stores data in flexible JSON-like documents, which changes how we approach performance monitoring.
The distributed nature of MongoDB also introduces unique metrics around replication lag, shard distribution, and cluster health that SQL databases typically don't have. This requires a specialized approach to performance monitoring.
Critical MongoDB Metrics You Should Track
Let's break down the most critical metrics you should track for optimal MongoDB performance:
Measure Query Performance
Query performance is often the first indicator of database health. Slow queries can bottleneck your entire application. Key metrics include:
- Query execution time: How long queries take to complete
- Query scanning efficiency: Documents scanned vs. returned
- Index usage: Whether queries use indexes effectively
- globalLock metrics: Time operations spend waiting for locks
- currentOp: Currently running operations and their execution time
MongoDB's explain()
method gives you visibility into how queries execute:
db.collection.find({status: "active"}).explain("executionStats")
This command shows execution time, documents examined, and whether indexes were used—all crucial data points for understanding query efficiency.
Analyze Connection Patterns
Connection issues can lead to cascading failures in your application. Keep an eye on:
- Current connections: The number of active client connections
- Available connections: How many more connections can MongoDB accept
- Connection creation rate: Sudden spikes may indicate connection leaks
- Active clients: Count of clients with active operations
- Network traffic: Bytes in/out per second to identify bandwidth issues
When connection counts approach your configured limit (default: 65,536), new client requests get rejected, causing application errors that can be hard to diagnose without proper metrics.
Optimize CPU & Memory Usage
MongoDB's performance depends heavily on having enough CPU and memory resources:
- CPU utilization: MongoDB is CPU-intensive during query execution and sorting
- System context switches: Excessive context switching indicates CPU contention
- Working set size: Data MongoDB needs to keep in RAM
- WiredTiger cache usage: Percentage of the cache being used
- Page faults: When MongoDB needs to fetch data from disk
- Memory fragmentation: Wasted memory due to fragmentation
High page fault rates usually mean your working set doesn't fit in RAM, which dramatically slows performance as MongoDB must read from disk.
Evaluate Disk Performance
Since MongoDB ultimately stores data on disk, I/O performance directly impacts database speed:
- Disk utilization: Percentage of time the disk is busy
- Read/write latency: Time taken for disk operations
- IOPS (Input/Output Operations Per Second): Rate of disk operations
- WiredTiger block manager metrics: File allocation patterns
- Journaling stats: Write-ahead log performance metrics
- Disk queue depth: Number of pending I/O operations
Storage bottlenecks often manifest as high disk utilization with relatively few operations, indicating your disks can't keep up with MongoDB's demands.
Quantify Database Operations
MongoDB's per-operation counters help identify specific bottlenecks:
- Insert/update/delete rates: Volume of write operations
- Read rates: Volume of read operations
- Getmore operations: Cursor operations for retrieving batches of results
- Command operations: Rate of database commands being executed
- Queued operations: Operations waiting to be processed
- Scan and order operations: Queries that require in-memory sorting
Unexpected changes in operation rates can signal application issues or inefficient query patterns that need attention.
Ensure Replication Health
For replica sets, these metrics are crucial for data consistency:
- Replication lag: Delay between operations on the primary and their application on secondaries
- Oplog window: Time range of operations stored in the oplog
- Replication buffer usage: Memory used for storing operations not yet applied
- Election metrics: Frequency and duration of primary elections
- Heartbeat latency: Time taken for replica set members to communicate
High replication lag can lead to stale reads and potential data loss during failover events.
Balance Sharded Clusters
If you're using sharded clusters, monitor these additional metrics:
- Chunk distribution: How evenly the data is distributed across shards
- Balancer activity: Frequency and duration of chunk migrations
- Jumbo chunks: Chunks that exceed the maximum size and can't be split
- Split operations: Rate at which chunks are being split
- Query targeting: Whether queries go to all shards or are targeted correctly
Unbalanced shards can lead to hotspots where some servers work much harder than others.
Deploy Basic MongoDB Monitoring Tools
Before jumping into advanced tools, start with MongoDB's built-in monitoring capabilities:
Run mongostat for Real-time Insights
The mongostat
Utility gives you a real-time view of database operations:
mongostat --port 27017 --authenticationDatabase admin -u username -p password
This command displays metrics like operations per second, memory usage, and connection counts updated every second.
Apply mongotop to Find Hotspots
To see which collections receive the most read/write activity, use mongotop
:
mongotop --port 27017 --authenticationDatabase admin -u username -p password
This helps identify hot collections that might benefit from additional optimization.
Configure Database Profiling
For deeper insights into slow queries, enable MongoDB's profiler:
db.setProfilingLevel(1, { slowms: 100 })
This logs all operations taking longer than 100ms to the system.profile
collection, which you can query to find problematic operations:
db.system.profile.find().sort({millis: -1}).limit(10)
Extract serverStatus Metrics
The serverStatus
command provides comprehensive metrics about your MongoDB instance:
db.adminCommand('serverStatus')
This returns detailed information about:
- WiredTiger cache statistics
- Connection counts
- Operation counters
- Memory usage
- Replication status
- Global locks
Review this output regularly to spot trends and anomalies.
Advanced Performance Techniques
Once you've mastered the basics, try these advanced techniques to squeeze more performance from your MongoDB deployment:
Design Efficient Indexes
Indexes dramatically improve query performance, but each index adds overhead to write operations. The key is finding the right balance:
- Compound indexes: Create indexes that support multiple query patterns
- Covered queries: Design indexes so queries can be satisfied entirely from the index
- Partial indexes: Index only the documents that match certain criteria
For example, instead of separate indexes on user_id
and created_at
, consider a compound index:
db.orders.createIndex({ user_id: 1, created_at: -1 })
This supports queries filtering by user, sorting by date, or both — making it more versatile than two separate indexes.
Structure Your Schema for Speed
Your document structure significantly impacts MongoDB performance:
- Right-size documents: Excessively large documents increase memory usage and network overhead
- Denormalize strategically: Include related data in a single document when it makes sense
- Use appropriate data types: Storing data with the correct type improves index efficiency
Consider this example of a denormalized schema that reduces the need for joins:
// Instead of separate collections
{
"_id": ObjectId("5f8d"),
"title": "Widget Pro",
"price": 29.99,
"category": {
"name": "Electronics",
"tax_rate": 0.07
}
}
Configure Connection Pools
Properly configured connection pools reduce the overhead of establishing new connections:
- Size pools appropriately: Too small causes queuing, too large wastes resources
- Monitor pool metrics: Track utilization and wait times
- Use separate pools for different workloads: Isolate read-heavy from write-heavy operations
Most MongoDB drivers support connection pooling, but you need to configure it correctly:
// Node.js example
const client = new MongoClient(uri, {
poolSize: 50,
maxIdleTimeMS: 30000
});
Solve Common MongoDB Performance Problems
Even with proper monitoring, you'll occasionally face performance challenges. Here's how to diagnose and fix common issues:
Accelerate Slow Queries
When queries run slowly:
- Check if appropriate indexes exist
- Examine query patterns for potential optimizations
- Look for excessive document scanning
For example, if you see a query scanning millions of documents but returning only a few, you likely need an index:
// Before adding an index
db.users.find({status: "active", region: "europe"}).explain("executionStats")
// Shows 1,000,000 documents scanned for 1,000 results
// Add a compound index
db.users.createIndex({status: 1, region: 1})
// After adding the index
db.users.find({status: "active", region: "europe"}).explain("executionStats")
// Shows 1,000 documents scanned for 1,000 results
Reduce Memory Pressure
If you're experiencing high page fault rates:
- Increase available RAM
- Add shards to distribute data
- Implement data archiving strategies
You can check memory usage patterns with:
db.serverStatus().mem
Eliminate Lock Contention
Lock contention occurs when operations wait for access to resources:
- Identify operations causing locks
- Break large operations into smaller batches
- Schedule maintenance during off-peak hours
Monitor lock metrics with:
db.serverStatus().locks
Unify Metrics with Last9 Monitoring
MongoDB's built-in monitoring tools provide valuable data, but they often leave you assembling pieces of the puzzle separately. Last9the brings these fragments together by connecting your MongoDB performance metrics with related traces and logs.
Last9 enhances MongoDB monitoring by:
- Correlating query performance metrics with application transactions
- Tracking WiredTiger cache statistics alongside memory usage patterns
- Visualizing replication lag metrics together with application impact
- Connecting high-latency MongoDB operations with affected user journeys
- Providing historical context for MongoDB metrics to identify performance trends
When troubleshooting MongoDB issues, this unified approach proves invaluable. Instead of jumping between different tools to understand why your queries are slow, Last9 shows the complete picture - from storage engine metrics to application response times.
For instance, when query execution time spikes, Last9 automatically links this MongoDB metric to the corresponding application endpoints. You can instantly see which collections are experiencing lock contention, how the WiredTiger cache is performing, and which user flows are impacted - information that would normally require coordinating data from multiple monitoring systems.
Comparing MongoDB Monitoring Solutions
While Last9 offers comprehensive MongoDB monitoring, let's look at the broader landscape of monitoring options:
Evaluate Built-in MongoDB Tools
MongoDB provides several built-in tools for basic monitoring:
- MongoDB Compass: GUI for exploring data and monitoring performance
- MongoDB Atlas: Cloud service with integrated monitoring dashboards
- Server Status Commands: Direct database commands to retrieve metrics
These native tools are great for quick checks but lack the depth and correlation capabilities needed for production environments.
Use Open Source Options
Several open-source tools can monitor MongoDB:
- Prometheus + MongoDB Exporter: Collects MongoDB metrics in Prometheus format
- Grafana: Creates visualization dashboards for MongoDB metrics
- Percona Monitoring and Management: Specialized for MongoDB performance monitoring
Open-source solutions offer flexibility but require significant setup and maintenance effort.
How to Configure Effective MongoDB Alerts
Instead of alerting on everything, focus on actionable metrics:
Define Actionable Alert Thresholds
Set up alerts for conditions that truly require attention:
- Replication lag exceeding 60 seconds: Indicates potential data consistency issues
- Query execution time above historical baseline: Shows performance degradation
- Connection usage above 80% of maximum: Provides time to address before connections are exhausted
Focus on Percentiles Over Averages
Averages hide problems affecting a subset of operations. Track the 95th and 99th percentiles for more accurate alerts:
- 95th percentile query time > 100ms: Catches slow queries affecting 5% of operations
- 99th percentile write latency > 50ms: Identifies issues affecting 1% of writes
Correlate Related Alerts
Single alerts often don't tell the full story. Last9 helps correlate related alerts to identify root causes:
- Connect high CPU usage with increased query times
- Link memory pressure to specific collection growth
- Correlate network issues with replication lag
This approach reduces noise and helps you focus on solving the underlying problem rather than addressing symptoms.
MongoDB Performance Best Practices
After working with hundreds of MongoDB deployments, here are the best practices that consistently deliver results:
Match Hardware to Workload
Match your infrastructure to your workload:
- RAM: Provision enough to hold your working set (typically 1.1x its size)
- CPU: Scale based on query complexity and concurrency
- Disk: Use SSDs for production MongoDB deployments
Schedule Regular Maintenance
Proactive maintenance prevents performance degradation:
- Compact databases: Run periodic compaction to reclaim space
- Update indexes: Rebuild indexes periodically to reduce fragmentation
- Monitor for fragmentation: Check and address both data and index fragmentation
Simulate Production Loads
Don't wait for production issues—simulate load to find bottlenecks:
- Use tools like
mongoperf
to test disk performance - Create realistic test data that matches production patterns
- Simulate peak workloads with tools like JMeter or custom load tests
Conclusion
Keeping an eye on a few key MongoDB metrics can go a long way. Focus on what helps you spot slowdowns or odd behavior early. Start simple, adjust as your needs grow, and let real issues guide what you monitor next.
FAQs
How often should I check MongoDB performance metrics?
For production systems, check key metrics at least every 5 minutes. Set up dashboards for real-time monitoring and review daily performance trends to spot gradual degradation.
Which MongoDB metrics are most important for my API service?
Focus on query response times, index usage efficiency, and connection counts. These directly impact API performance. Also, monitor memory usage to ensure your working set fits in RAM.
Can I use the same monitoring approach for MongoDB Atlas?
MongoDB Atlas provides its monitoring interface, but the same metrics matter. You can also integrate Atlas with Last9 to correlate database performance with your application metrics.
How do I tell if my indexes are effective?
Check the totalKeysExamined
vs. totalDocsExamined
in query, explain plans. Effective indexes show similar numbers for both metrics. Large differences indicate table scans rather than index usage.
What's the impact of replica set elections on performance?
Primary elections typically cause 5-30 seconds of write unavailability. Monitor replSetGetStatus
to track election events and measure their impact on your application.
How can I predict when I'll need to scale my MongoDB deployment?
Track growth trends in data size, operation counts, and resource utilization. When any metric consistently exceeds 70% of capacity, it's time to plan for scaling.
What are the key WiredTiger cache metrics to monitor?
Watch the "cache dirty %" (ideally under 10%) and "cache used %" (ideally under 80%). Also monitor "pages read into cache" and "pages written from cache" to understand cache efficiency.
How do I monitor the MongoDB oplog window?
Use db.getReplicationInfo()
to see the oplog time range. Ensure it's large enough to accommodate your longest expected primary downtime, typically at least 24 hours.
How do I identify network issues affecting MongoDB?
Monitor network metrics like bytes in/out, network latency between cluster members, and connection errors. Correlate spikes in network latency with query performance degradation.