System monitoring is no longer optional, and Prometheus has become a go-to tool for many teams. The metrics endpoint plays a central role, acting as the gateway for all the monitoring data.
Grasping how this endpoint works is crucial for anyone looking to set up effective observability and improve their infrastructure’s performance.
What Is a Prometheus Metrics Endpoint?
A Prometheus metrics endpoint is an HTTP endpoint (usually /metrics
) that exposes monitoring data in a format Prometheus can scrape. This endpoint serves as the interface between your applications and the Prometheus server, allowing it to collect time-series data about your system's performance.
Think of it as your application's vital signs monitor—constantly broadcasting health data that Prometheus can check and record.
The metrics endpoint follows a specific text-based format that looks something like this:
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="post",code="200"} 1027
http_requests_total{method="post",code="400"} 3
http_requests_total{method="get",code="200"} 9836
Each line represents a different measurement (or the same measurement with different labels), allowing Prometheus to build a detailed picture of your system's performance over time.
Why Prometheus Metrics Endpoints Matter for Your Systems
Before we jump into the technical details, let's talk about why you should care about Prometheus metrics endpoints:
- Real-time visibility: Get immediate insight into how your systems are performing
- Problem detection: Catch issues before they become major outages
- Capacity planning: Understand usage patterns to plan for future growth
- Performance optimization: Identify bottlenecks and areas for improvement
When your application exposes a well-designed metrics endpoint, you're not just collecting data—you're creating the foundation for a proactive approach to system management.
Getting Started with Prometheus Metrics Endpoints
Setting Up Your First Metrics Endpoint
If you're new to Prometheus, setting up your first metrics endpoint might seem like a big task. But it's pretty straightforward.
For a Go application, you can use the official Prometheus client library:
package main
import (
"net/http"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
func main() {
// Create a counter metric
counter := prometheus.NewCounter(prometheus.CounterOpts{
Name: "my_app_requests_total", // The metric name follows the convention of using underscores
Help: "Total number of requests received", // Help text describes what the metric measures
})
// Register the counter with Prometheus's default registry
prometheus.MustRegister(counter)
// Increment the counter in your application code
// You would typically call this in your request handlers
counter.Inc()
// Expose the metrics endpoint on /metrics
http.Handle("/metrics", promhttp.Handler())
// Start the HTTP server on port 8080
http.ListenAndServe(":8080", nil)
}
This Go code sets up a basic Prometheus metrics endpoint. It creates a counter named my_app_requests_total
, registers it with Prometheus, and exposes it through an HTTP endpoint. The counter increments with each call to counter.Inc()
, which you would typically place in your application's request handling logic.
With just this small snippet of code, your application now has a /metrics
endpoint that Prometheus can scrape. When you visit this endpoint in your browser, you'll see a formatted output of all metrics your application is exposing.
Client Libraries for Different Languages
You're not limited to Go. Prometheus has official client libraries for several languages:
Language | Client Library | Features |
---|---|---|
Java | prometheus-client-java | Full featured, Spring Boot integration |
Python | prometheus-client | Simple API, WSGI middleware |
Go | prometheus/client_golang | First-class support |
Ruby | prometheus-client-ruby | Supports custom collectors |
Node.js | prom-client | Event loop metrics |
There are also many community-maintained client libraries for other languages. The key is finding one that fits well with your tech stack.
Prometheus Metric Types
Prometheus defines four core metric types, each suited to different kinds of measurements:
Counter
Counters only go up (or reset to zero when the application restarts). They're perfect for tracking things like:
- Total number of requests
- Errors encountered
- Tasks completed
# Example counter
http_requests_total{method="GET"} 12345
Gauge
Gauges can go up and down, making them ideal for:
- Current memory usage
- CPU utilization
- Queue size
# Example gauge
memory_usage_bytes{instance="server-01"} 1024000000
Histogram
Histograms sample observations and count them in configurable buckets, also tracking a sum of all observed values:
- Request duration
- Response size
# Example histogram
http_request_duration_seconds_bucket{le="0.1"} 2000
http_request_duration_seconds_bucket{le="0.5"} 3000
http_request_duration_seconds_bucket{le="1"} 3500
http_request_duration_seconds_bucket{le="+Inf"} 4000
http_request_duration_seconds_sum 2500
http_request_duration_seconds_count 4000
Summary
Summaries are similar to histograms but calculate streaming quantiles on the client side:
- Request duration percentiles
- Response size percentiles
# Example summary
rpc_duration_seconds{quantile="0.5"} 0.045
rpc_duration_seconds{quantile="0.9"} 0.075
rpc_duration_seconds{quantile="0.99"} 0.125
rpc_duration_seconds_sum 1.346
rpc_duration_seconds_count 42
Choosing the right metric type is crucial for effective monitoring. Use counters for things that only increase, gauges for values that go up and down, and histograms or summaries when you need to understand the distribution of values.
Advanced Techniques for Prometheus Metrics Endpoints
Custom Collectors
While the basic metrics provided by client libraries are useful, you'll often need to expose application-specific metrics. Custom collectors let you do exactly that.
Here's a simple example in Python:
from prometheus_client import start_http_server, Summary, Counter
import random
import time
# Create metrics
REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request')
REQUESTS = Counter('hello_worlds_total', 'Hello Worlds requested')
# Decorate function with metric
@REQUEST_TIME.time()
def process_request():
# Simulate a random processing time
time.sleep(random.random())
REQUESTS.inc()
if __name__ == '__main__':
# Start server
start_http_server(8000)
# Generate some requests
while True:
process_request()
This Python example demonstrates a custom collector implementation. The code creates two metrics: a Summary to track request processing time and a Counter to track the number of requests. The @REQUEST_TIME.time()
decorator automatically times the function execution and records it in the Summary metric.
The Counter increases with each function call, tracking the total number of requests. The start_http_server(8000)
function starts a metrics endpoint on port 8000 that Prometheus can scrape.
Using Labels Effectively
Labels add dimensions to your metrics, allowing you to slice and dice your data in powerful ways:
api_requests_total{path="/users", method="GET", status="200"} 93724
api_requests_total{path="/users", method="GET", status="404"} 14
api_requests_total{path="/users", method="POST", status="201"} 10342
With these labels, you can create queries that show:
- Requests by path
- Success rates by method
- Error patterns across endpoints
But be careful—too many label combinations can lead to "cardinality explosion," which can overload your Prometheus server. As a rule of thumb, keep your label combinations under 10,000 per metric.
Customizing Exposition Format
While the default Prometheus exposition format works well, you sometimes need more control. Most client libraries allow you to customize how metrics are exposed:
// Create a custom registry
registry := prometheus.NewRegistry()
// Register metrics to this registry only
counter := prometheus.NewCounter(prometheus.CounterOpts{
Name: "my_subsystem_requests_total",
Help: "Total number of requests handled by my subsystem",
})
registry.MustRegister(counter)
// Use a custom handler with this registry
http.Handle("/custom_metrics", promhttp.HandlerFor(registry, promhttp.HandlerOpts{}))
This code snippet demonstrates how to create a custom Prometheus registry instead of using the default global registry. This approach lets you maintain separate collections of metrics that you can expose through different endpoints.
In this example, the counter is registered only with the custom registry and exposed at the /custom_metrics
endpoint. This is particularly useful when you want to separate internal metrics from those exposed to external monitoring systems.
This approach lets you expose different sets of metrics on different endpoints, which can be useful for separating internal and external metrics.
Best Practices for Prometheus Metrics Endpoints
Naming Conventions
Good metric names make your monitoring system more intuitive. Follow these guidelines:
- Use a prefix for your application (e.g.,
myapp_
) - Add units as suffixes (e.g.,
_seconds
,_bytes
) - Use snake_case for metric names
- Be consistent across your organization
Bad: get_users
Good: myapp_http_requests_total
Security Considerations
Your metrics endpoint can contain sensitive information. Consider:
- Using basic auth or TLS for the metrics endpoint
- Exposing the endpoint only on a private network
- Using a separate port for metrics
- Filtering sensitive metrics before exposition
// Example of basic auth for metrics endpoint
func metricsHandler(w http.ResponseWriter, r *http.Request) {
user, pass, ok := r.BasicAuth()
if !ok || user != "prometheus" || pass != "secret" {
w.Header().Set("WWW-Authenticate", `Basic realm="Restricted"`)
http.Error(w, "Unauthorized.", http.StatusUnauthorized)
return
}
promhttp.Handler().ServeHTTP(w, r)
}
This security-focused example shows how to add HTTP Basic Authentication to your metrics endpoint. The function checks for valid credentials before allowing access to the metrics.
If authentication fails, it returns a 401 Unauthorized status code. In a production environment, you would replace the hardcoded credentials with more secure options like environment variables or a configuration file.
You would use this handler function with http.HandleFunc("/metrics", metricsHandler)
instead of directly exposing the Prometheus handler.
Performance Optimization
Metrics collection should be lightweight. Keep these tips in mind:
- Cache expensive metrics calculations
- Use appropriate metric types to minimize overhead
- Consider using push gateways for batch jobs
- Monitor the performance of your metrics endpoint itself
Prometheus with Other Observability Tools
Using Last9 for Advanced Monitoring
If you're looking for a managed observability solution that's easy on your budget without sacrificing performance, I recommend giving Last9 a try. With a pricing model based on events ingested, costs are predictable and transparent.
Last9 has been trusted by industry leaders and has even monitored some of the largest live-streaming events in history. It integrates with OpenTelemetry and Prometheus, bringing together metrics, logs, and traces for a complete view of your system’s performance.
OpenTelemetry and Prometheus
The observability landscape is evolving, with OpenTelemetry emerging as a standard for instrumentation. You can use both:
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/metric"
"github.com/prometheus/client_golang/prometheus"
)
func setupMetrics() {
// OpenTelemetry metrics
meter := otel.GetMeterProvider().Meter("my-service")
counter, _ := meter.Int64Counter("requests_total")
// Prometheus metrics
promCounter := prometheus.NewCounter(prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total number of HTTP requests",
})
prometheus.MustRegister(promCounter)
}
This code demonstrates how to use both OpenTelemetry and Prometheus together in the same application. It creates two counters - one using OpenTelemetry's API (meter.Int64Counter
) and another using Prometheus's API (prometheus.NewCounter
).
This dual instrumentation approach lets you transition gradually to OpenTelemetry while maintaining compatibility with existing Prometheus monitoring. In a real application, you would increment both counters in your request handling code, ensuring metrics are available through both systems.
With this setup, you're future-proofing your instrumentation while still getting the benefits of Prometheus.
Grafana Dashboards
Prometheus metrics are only useful if you can visualize them. Grafana is the perfect companion:
Dashboard Type | Use Case | Examples |
---|---|---|
System | CPU, memory, disk, network | Node Exporter Full |
Application | Request rates, errors, durations | RED Method Dashboard |
Business | User signups, purchases, engagement | Custom Business Metrics |
Start with templates, then customize them to fit your specific needs.
Troubleshooting Common Prometheus Issues
Missing Metrics
If you're not seeing the metrics you expect:
- Check that your application is exposing the metric
- Verify Prometheus is scraping the correct endpoint
- Look for errors in the Prometheus logs
- Check for metric naming conflicts or registration issues
High Cardinality Problems
Too many unique time series can crash your monitoring:
- Reduce the number of label combinations
- Avoid using high-cardinality values like user IDs or request IDs as label values
- Consider aggregating certain metrics client-side
- Use recording rules to pre-aggregate common queries
Inconsistent Data
If your data seems inconsistent:
- Check your application's metrics collection logic
- Verify that counters aren't being decreased
- Look for application restarts that might reset counters
- Check for clock synchronization issues across your infrastructure
Practical Prometheus Instrumentation Examples
Instrumenting a Web Service
For a typical web service, you might want to track:
from flask import Flask, request
from prometheus_client import Counter, Histogram, generate_latest
import time
app = Flask(__name__)
# Define metrics
REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP Requests', ['method', 'endpoint', 'status'])
REQUEST_LATENCY = Histogram('http_request_duration_seconds', 'HTTP request latency in seconds', ['method', 'endpoint'])
@app.route('/metrics')
def metrics():
return generate_latest()
@app.route('/')
def home():
# Start timer
start_time = time.time()
# Your actual logic here
result = "Hello, World!"
# Record metrics
duration = time.time() - start_time
REQUEST_LATENCY.labels(method='GET', endpoint='/').observe(duration)
REQUEST_COUNT.labels(method='GET', endpoint='/', status=200).inc()
return result
if __name__ == '__main__':
app.run(port=8080)
This Flask example shows how to instrument a web service with Prometheus metrics. It creates two metrics: a Counter that tracks the total number of HTTP requests (with labels for method, endpoint, and status code), and a Histogram that measures request duration.
The /metrics
endpoint returns all metrics in Prometheus format using the generate_latest()
function. In the home route handler, the code measures how long it takes to process the request then records that duration in the histogram using the .observe()
method and increments the request counter using .inc()
. The labels allow you to filter and aggregate metrics in your Prometheus queries.
This simple instrumentation captures request counts and latencies, giving you visibility into your service's performance.
Monitoring Database Connections
For database monitoring:
package main
import (
"database/sql"
"net/http"
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
_ "github.com/lib/pq"
)
var (
dbConnections = prometheus.NewGauge(prometheus.GaugeOpts{
Name: "db_connections_current",
Help: "Current number of database connections",
})
dbQueryDuration = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "db_query_duration_seconds",
Help: "Database query duration in seconds",
Buckets: prometheus.DefBuckets, // Using default histogram buckets: .005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10
},
[]string{"query_type"}, // Label to differentiate between types of queries (select, insert, etc.)
)
)
func main() {
// Register metrics
prometheus.MustRegister(dbConnections)
prometheus.MustRegister(dbQueryDuration)
// Connect to database
db, err := sql.Open("postgres", "postgres://user:password@localhost/dbname")
if err != nil {
panic(err)
}
// Update metrics in a goroutine
go func() {
for {
stats := db.Stats()
dbConnections.Set(float64(stats.InUse)) // Update the gauge with current connection count
time.Sleep(15 * time.Second) // Poll every 15 seconds
}
}()
// Example query with instrumentation
http.HandleFunc("/query", func(w http.ResponseWriter, r *http.Request) {
// Create a timer that will automatically observe the duration upon defer
timer := prometheus.NewTimer(dbQueryDuration.With(prometheus.Labels{"query_type": "select"}))
defer timer.ObserveDuration() // This will record the time spent in this handler
// Perform database query
_, err := db.Query("SELECT * FROM users LIMIT 10")
if err != nil {
http.Error(w, err.Error(), 500)
return
}
w.Write([]byte("Query executed"))
})
// Expose metrics endpoint
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe(":8080", nil)
}
This example demonstrates how to instrument database connections and query performance in Go. It creates two metrics: a Gauge that tracks the current number of database connections and a Histogram that measures query duration.
The background goroutine periodically polls the database connection stats and updates the gauge. For query monitoring, the code uses prometheus.NewTimer()
and defer timer.ObserveDuration()
to automatically measure and record how long each query takes.
This pattern is very efficient as it ensures the duration is recorded even if the function returns early or panics. The query_type
label allows you to track different types of database operations separately.
This code tracks both current connections and query durations, giving you insight into database performance.
Conclusion
To wrap up, understanding Prometheus metrics endpoints is essential for building a strong monitoring and observability strategy. These endpoints enable you to collect and expose critical metrics, giving you the insights needed to keep your system performing at its best.
FAQ
What's the difference between Prometheus and other monitoring solutions?
Prometheus uses a pull-based model that scrapes metrics from your applications, unlike push-based systems where applications send metrics to a collector. This approach gives you more control and reliability, as the monitoring system isn't dependent on the health of your applications to receive data.
How often should Prometheus scrape my metrics endpoint?
The default is every 15 seconds, which works well for most applications. For high-traffic services, you might want to increase this to 30 or 60 seconds to reduce overhead. For critical systems requiring real-time monitoring, you could go as low as 5 seconds, but be aware of the increased load.
Can I have multiple metrics endpoints in one application?
Yes, you can expose different metrics on different endpoints. This is useful for separating internal metrics from those you want to expose publicly, or for organizing metrics by subsystem.
How do I handle metrics for short-lived processes?
Short-lived processes don't work well with Prometheus's pull model. Use the Pushgateway, which allows ephemeral jobs to push their metrics to an intermediate service that Prometheus can then scrape.
What's the recommended way to monitor Kubernetes with Prometheus?
Use the Prometheus Operator, which makes deploying and managing Prometheus on Kubernetes much easier. It provides custom resources for defining monitoring targets and automatically generates scrape configurations.
How can I reduce the load of metrics collection on my services?
Consider using a separate process or sidecar for metrics collection, caching expensive metric calculations, using appropriate metric types (histograms can be expensive), and being selective about what you measure.