Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

Prometheus Logging Explained for Developers

Understand how Prometheus logging captures structured metrics, improves query performance, and scales observability in production systems.

Jun 20th, ‘25
Prometheus Logging Explained for Developers
See How Last9 Works

Unified observability for all your telemetry. Open standards. Simple pricing.

Talk to us

Running apps in production? You need visibility fast. Traditional logging gives you scattered events. Prometheus gives you structured, queryable data that scales.

In this guide, we’ll break down how to use Prometheus for logging-style observability, where it fits in your stack, and how to plug it into tools like Grafana or your cloud-native setup.

What Makes Prometheus Logging Different?

Prometheus isn’t your usual log-to-file setup. It moves you from dumping text lines to tracking structured, real-time metrics.

Here’s the key difference:

  • Logs are unstructured strings.
  • Prometheus metrics are structured time-series data.

Instead of writing:

User logged in at 2025-06-19 10:30:15

You're tracking:

user_logins_total{method="oauth"} 1547

That’s not just cleaner, it’s queryable, measurable, and easier to work with when debugging or spotting anomalies.

Why it's important:

  • Real-time visibility: Prometheus scrapes your services on a schedule (pull model), so you always have fresh data.
  • Low overhead: No agents tailing logs. Just an HTTP endpoint (/metrics) that Prometheus pulls from.
  • Powerful queries: Use PromQL to calculate rates, percentiles, or even set up custom alerts without parsing logs.
  • Built to scale: Especially in dynamic environments like Kubernetes, where services start and stop often.

This isn't just a different logging format. It's a shift to treating observability as metrics-first. And when you need to visualize or correlate that data, tools like Grafana plug in easily.

💡
If you're also thinking about how Prometheus fits into tracing workflows, this guide on using Prometheus with distributed tracing breaks it down with practical examples.

How It Works Behind the Scenes

Prometheus doesn’t log events line by line. Instead, it collects metrics, numerical representations of system behavior at fixed intervals. This shift enables better aggregation, alerting, and analysis.

Application exposes metrics over HTTP

Your application needs to expose an HTTP endpoint (usually /metrics). This endpoint returns all available metrics in Prometheus’ text-based exposition format.

These metrics don’t represent individual events. Instead, they expose the current state, for example, counters, gauges, and histograms that accumulate or change over time.

Prometheus scrapes the endpoint periodically

The Prometheus server polls each /metrics endpoint on a fixed schedule (default: every 15s).

  • Each scrape captures the latest values.
  • These are stored as time series in Prometheus’ internal database.
  • PromQL (Prometheus Query Language) lets you run queries against this data for dashboards, alerts, or debugging.

Native support for dynamic environments like Kubernetes

Prometheus integrates tightly with Kubernetes:

  • Uses Kubernetes service discovery to automatically find new pods or services.
  • Scrape behavior is controlled via annotations like
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
  • No need to manually update configurations as workloads change.

Example: What exposed metrics look like

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",status="200",service="api"} 12450
http_requests_total{method="POST",status="201",service="api"} 892
http_requests_total{method="GET",status="404",service="api"} 23

Each metric is:

  • Named (http_requests_total)
  • Typed (counter)
  • Labeled with key-value pairs for flexible filtering (e.g., method, status, service)
  • Value indicates the latest count at the time of the scrape

This format makes it easy to aggregate by status code, service, or method, something traditional logs aren’t built to do efficiently.

💡
To understand how your applications should expose metrics for Prometheus to collect, this guide on setting up Prometheus metrics endpoints is a good place to start.

Prometheus vs Log-Centric Tools: How to Choose the Right Approach

Understanding how Prometheus-style metrics compare with log-centric observability helps clarify when each approach makes sense.

Data Models: Metrics vs Logs

  • Log-centric tools focus on capturing and analyzing unstructured event data, application logs, system logs, audit trails, etc. They’re useful for reconstructing incidents or drilling into specific sequences of events.
  • Prometheus, on the other hand, collects structured, numeric time-series data. It’s designed for tracking service performance, resource usage, and system behavior over time.

If you’re troubleshooting a specific error or investigating a security event, logs are helpful. For monitoring long-term trends, setting SLOs, or triggering alerts, metrics give you faster, more scalable answers.

Querying and Analysis

  • Log tools usually involve search queries that filter through event records.
  • Prometheus uses PromQL, a purpose-built language for time-series math. Calculating error rates, percentiles, or resource saturation is fast and efficient.

When to Use Which

Use Case Best Fit
Auditing, security analysis, compliance Log-centric tools
Debugging a specific request or user session Log-centric tools
Real-time monitoring and proactive alerting Prometheus
Tracking SLIs/SLOs and trend analysis Prometheus
Kubernetes-native infrastructure Prometheus

For most teams, the right solution isn’t binary. Metrics and logs often work best together: metrics to detect and alert, logs to debug and explain.

Cost and Operational Tradeoffs

  • Log-centric platforms often charge based on ingestion volume. If logs are verbose or high-frequency, costs can escalate.
  • Prometheus is open-source and self-managed. While that shifts operational overhead to your team, you control storage, retention, and scaling.
Probo Cuts Monitoring Costs by 90% with Last9
Probo Cuts Monitoring Costs by 90% with Last9

What Kind of Data Does Prometheus Capture?

Prometheus doesn’t capture logs or raw events. Instead, it collects structured metric numbers that represent system state over time.

This approach works well for observability use cases like monitoring performance, tracking system behavior, and triggering alerts.

Application Metrics

Your application can expose custom metrics to report what's happening inside: things like request counts, error rates, response durations, or queue lengths. These metrics are updated directly in code and scraped by Prometheus at regular intervals.

Here’s a quick Python example using the Prometheus client:

from prometheus_client import Counter, Histogram

# Track request counts
requests_total = Counter('http_requests_total', 'Total requests', ['method', 'endpoint'])

# Track response time distributions
response_time = Histogram('http_request_duration_seconds', 'Request duration in seconds')

# Inside your request handler
requests_total.labels(method='GET', endpoint='/api/users').inc()
response_time.observe(0.142)

This gives you structured, label-rich data you can query, visualize, or alert on.

Infrastructure Metrics with Exporters

Prometheus uses exporters to monitor infrastructure components. The most common is the Node Exporter, which exposes system-level metrics like:

  • CPU and memory usage
  • Disk I/O and filesystem stats
  • Network throughput

Other exporters cover databases, load balancers, message queues, and more. Each runs as a sidecar or daemon, exposing a /metrics endpoint that Prometheus scrapes, just like any application.

Kubernetes Metrics

In Kubernetes, Prometheus integrates directly with the API server to auto-discover pods, services, and nodes. It collects:

  • Resource usage (CPU, memory, etc.)
  • Pod and container lifecycles
  • Cluster state and deployment health

You can also annotate your pods to expose app-level metrics:

metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"

This makes it easy to collect metrics without hardcoding static scrape targets.

Container Metrics

If you're running Docker or any container runtime, Prometheus can track:

  • CPU throttling and limits
  • Memory usage per container
  • Network and block I/O
  • Container restarts and uptime

These metrics help diagnose performance bottlenecks and resource constraints in containerized environments.

Business Metrics

Prometheus isn’t just for infrastructure. You can expose application-level business metrics like:

  • User sign-ups
  • Completed purchases
  • API usage per customer
  • Feature flags or A/B test events

These metrics give product and engineering teams a shared source of truth and let you tie system behavior to user impact.

💡
If you're wondering how to expose metrics properly in your applications, this guide on Prometheus metrics endpoints walks through the essentials.

Metric Types: Choosing the Right One

Prometheus supports different metric types, each designed for a specific pattern:

  • Counter: Monotonic values that only increase (e.g., http_requests_total)
  • Gauge: Values that go up and down (e.g., current_queue_length)
  • Histogram: Track distributions like request latency across buckets
  • Summary: Similar to histograms, but includes quantiles like the 95th percentile

Choosing the right type helps with accurate aggregations, alerting, and long-term trend analysis.

How to Get Insights from Prometheus Data

Prometheus becomes truly valuable when you start querying your metrics and turning them into alerts or dashboards. Its time-series data model and PromQL make this possible with precision and flexibility.

The Time-Series Model

Each Prometheus metric is stored as a unique time series, identified by:

  • The metric name (e.g., cpu_usage_percent)
  • A set of labels (key-value pairs that add context)
  • A timestamp-value pair for each data point

For example:

cpu_usage_percent{instance="server1", core="0"} 45.2 @1687123456
cpu_usage_percent{instance="server1", core="1"} 32.1 @1687123456
cpu_usage_percent{instance="server2", core="0"} 78.9 @1687123456

This structure lets you easily group, filter, and aggregate data by server, region, deployment group, or any label you define.

Querying with PromQL

PromQL (Prometheus Query Language) is built specifically for working with time-series metrics. It supports powerful operations like rates, aggregations, and percentile calculations.

Here are some common examples:

# Average response time over 5 minutes
avg(rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m]))

# Error rate as a percentage
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100

# 95th percentile response time
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

These queries help answer operational questions like:

  • Is latency increasing?
  • Are error rates spiking?
  • What’s the performance across services or environments?

PromQL is also used as the base for alerting.

Defining Alerts with Alertmanager

Prometheus integrates with Alertmanager to evaluate alert conditions and handle notifications. You write alert rules using PromQL, and Alertmanager takes care of routing and delivery.

groups:
- name: example
  rules:
  - alert: HighErrorRate
    expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.1
    for: 5m
    annotations:
      summary: "High error rate detected"
      description: "Error rate is {{ $value }}%"

You can send alerts to:

  • Email
  • Slack
  • PagerDuty
  • Webhooks

Alertmanager also supports deduplication, grouping, silencing, and escalation policies.

💡
To set up alerting on your Prometheus metrics, this guide on Prometheus Alertmanager explains how to configure rules and send notifications.

Scaling with Observability Platforms

As your usage grows, self-hosted Prometheus setups often hit limitations, especially around high-cardinality metrics and data retention.

Last9 offers managed solutions that work with Prometheus and extend it with:

  • High-cardinality support at scale
  • Budget-aware usage controls
  • Long-term storage and efficient querying
  • Full OpenTelemetry integration (metrics, logs, and traces)

Teams at CleverTap, Replit, and Probo trust Last9 to go beyond just metrics, combining traces and logs while keeping infrastructure costs predictable.

Prometheus and Grafana: How They Work Together (and Why You Need Both)

Prometheus and Grafana often show up together in observability stacks, but they serve very different roles. One collects and stores metrics. The other makes those metrics understandable at a glance.

Let’s break down how they complement each other and when each one takes the lead.

Prometheus: Collects, Stores, and Queries Metrics

Prometheus is your system’s metrics engine. It scrapes data from /metrics endpoints, stores it in a time-series format, and makes it queryable through PromQL.

It’s great at answering questions like:

  • How many requests per second is this service handling?
  • What’s the memory usage across all pods?
  • What’s the error rate over the last 10 minutes?

Prometheus includes a basic UI for query testing, but it’s primarily designed for machines and automation, not for dashboards or reporting.

Grafana: Turns Metrics into Dashboards

Grafana is built to visualize time-series data. It connects to Prometheus and transforms raw metrics into something humans can work with—graphs, tables, gauges, heatmaps, and so on.

Here’s what that workflow typically looks like:

  • Applications expose metrics in Prometheus format
  • Prometheus scrapes and stores those metrics
  • Grafana queries Prometheus
  • Dashboards display real-time system behavior

Each dashboard panel runs a PromQL query behind the scenes. For example:

sum(rate(http_requests_total[5m])) by (service)

You can group multiple panels, apply filters, and use dashboard variables to switch views across services or environments.

Grafana Supports Multiple Data Sources

Unlike Prometheus, which only handles metrics, Grafana can connect to many observability tools.

You might pull in:

  • Prometheus for metrics
  • Elasticsearch or Loki for logs
  • CloudWatch or GCP Monitoring for cloud metrics
  • Tempo or Jaeger for traces

This makes Grafana a single-pane view across all telemetry types, while Prometheus stays laser-focused on time-series metrics.

Alerting: Rules in Prometheus, Visual Setup in Grafana

Both tools offer alerting, but they differ in how they're used.

Prometheus alerts are:

  • Defined as code (PromQL rules)
  • Integrated with Alertmanager
  • Ideal for system-level alerts (e.g., CPU > 90%, error rate > 5%)

Grafana alerts are:

  • Created visually from dashboards
  • Easier for non-engineering teams to set up
  • Great for business metrics or ad-hoc alerting

In practice, many teams run both Prometheus for critical infrastructure alerts, Grafana for dashboard-level insights.

💡
For a closer look at how Prometheus works with Grafana to power observability dashboards, this guide on Prometheus and Grafana covers what each tool does and how they fit together.

Advanced Prometheus Patterns That Scale

Once you're comfortable with scraping, querying, and dashboarding, Prometheus offers several ways to scale, optimize, and tailor your monitoring setup for growing systems and complex use cases.

Use Recording Rules to Speed Up Dashboards and Alerts

Some queries, especially those involving rates, histograms, or high-cardinality labels can get expensive to compute repeatedly. That’s where recording rules help you.

Recording rules precompute PromQL expressions and store the result as a new metric. This boosts performance for both dashboards and alerts by reducing query load at runtime.

groups:
  - name: performance_rules
    interval: 30s
    rules:
      - record: job:http_request_rate5m
        expr: sum(rate(http_requests_total[5m])) by (job)

      - record: job:http_error_rate5m
        expr: sum(rate(http_requests_total{status=~"5.."}[5m])) by (job)
              / sum(rate(http_requests_total[5m])) by (job)

These new job:-prefixed metrics are faster to query and easier to reuse across dashboards.

Scale Horizontally with Federation

As you scale out services, a single Prometheus instance can become a bottleneck. Federation helps you split the load across multiple Prometheus servers and roll up relevant metrics to a central view.

You might have local Prometheus instances for each region, service, or environment, and a global Prometheus scraping summaries via federation.

scrape_configs:
  - job_name: 'federate'
    scrape_interval: 15s
    metrics_path: '/federate'
    honor_labels: true
    params:
      'match[]':
        - '{job=~"prometheus"}'
        - '{__name__=~"job:.*"}'
    static_configs:
      - targets:
          - 'prometheus-us-west:9090'
          - 'prometheus-eu-central:9090'

Federation avoids pulling in every metric and instead focuses on rollups, keeping your global Prometheus lightweight.

Write Custom Exporters for Non-Instrumented Systems

When no official exporter exists for a service you want to monitor, you can write your own using a client library. Python, Go, and Java clients make this straightforward.

Here’s a quick example using Python and psutil to export system-level CPU and memory stats:

from prometheus_client import start_http_server, Gauge
import time, psutil

cpu_usage = Gauge('system_cpu_usage_percent', 'CPU usage percent')
memory_usage = Gauge('system_memory_usage_percent', 'Memory usage percent')

def collect_metrics():
    while True:
        cpu_usage.set(psutil.cpu_percent())
        memory_usage.set(psutil.virtual_memory().percent)
        time.sleep(15)

if __name__ == '__main__':
    start_http_server(8000)
    collect_metrics()

Just expose this on /metrics, and Prometheus can scrape it like any other target.

Manage High Cardinality Before It Becomes a Problem

High-cardinality metrics—where labels like user_id, session_id, or url have thousands of unique values, which can quickly overwhelm Prometheus’s storage and query performance.

Here’s how to reduce the impact:

  • Aggregate at ingest: Don’t track per-user metrics unless necessary. Aggregate by region, user type, or status code.
  • Pre-aggregate with recording rules: Store daily or hourly rollups that reduce label combinations.
  • Use remote write for offloading: Send high-cardinality metrics to external systems built for it. Last9 handles this kind of load without punishing performance or cost.

Running Prometheus in Production Without the Burnout

Getting Prometheus into production is easy. Keeping it performant, secure, and maintainable over time? That takes a bit more planning.

Here are the key areas to focus on so you don’t end up firefighting your monitoring stack.

Plan for Resources, Especially RAM and Disk

Prometheus stores time-series data in memory and on disk, so capacity planning matters. A rough estimate: 1–2 bytes of RAM per sample. That adds up fast when you’re dealing with high-cardinality metrics or short scrape intervals.

Storage usage scales with:

  • How much data you ingesting
  • How long you retaining it
  • The number of unique time series

If you’re pushing a lot of metrics, especially with many dynamic labels, disk usage can spike quickly. Keep an eye on the ingestion rate and active series count.

Lock Down What You're Exposing

Your /metrics endpoint is often public inside your cluster, and that can be risky.

  • Use authentication, IP allowlists, or Kubernetes network policies to restrict access.
  • Sanitize your labels and values. Don’t expose user emails, IDs, or tokens as metric labels. It’s both a security risk and a cardinality nightmare.

A good practice: audit your metrics endpoint like you would any API.

Don't Skip Backup and Long-Term Storage

Prometheus stores data locally by default, and local storage is ephemeral. For anything critical:

  • Use remote write to forward data to a long-term storage backend.
  • Snapshot your TSDB if you're managing state across restarts.
  • Consider solutions like Last9 that give you managed, durable metric storage out of the box, especially for compliance, audits, or historical trends.

Optimize for Performance and Scalability

Scrape interval, service discovery, and query load all affect performance.

  • Start with 15s scrape intervals and tune from there based on granularity and system load.
  • Use service discovery (like in Kubernetes) instead of static scrape targets. This reduces config churn and helps auto-scale your monitoring with your infrastructure.
  • Profile your queries. Long-running PromQL expressions can overload your server. Use recording rules for anything expensive or used frequently.

If you’re noticing dashboard latency or alert delays, your queries are often the first place to look.

💡
If you're using counters in Prometheus and want to calculate things like requests per second, this guide on the Prometheus rate function explains how it works with clear examples.

Diagnosing Problems in Prometheus Logging

Even with a strong setup, Prometheus can run into problems, missing data, slow dashboards, or storage bloat.

Here's a practical checklist to get things back on track when Prometheus isn't behaving the way you expect.

Metrics Aren’t Showing Up in Prometheus

If you don’t see metrics you expect:

  1. Check the /metrics endpoint.
    Run a quick curl http://<your-app>:<port>/metrics and confirm that metrics are being exposed correctly. Look out for malformed output or missing labels.
  2. Look at the Prometheus targets UI.
    Navigate to http://<prometheus-host>:9090/targets. Check if your job is listed, whether it’s up, and look for scrape errors.
  3. Review your scrape config.
    A missing port, wrong path, or typo in job_name or static_configs can silently break scrapes.

High Memory Usage or OOMs

Prometheus' memory usage grows with the number of active time series, especially if you're exposing dynamic or user-level labels.

Things to check:

  • Are you tracking unique labels like user_id, email, or uuid? That’s a fast path to high cardinality.
  • Use this PromQL query to find the top memory-hogging metrics:
topk(10, count by (__name__)({__name__=~".+"}))

This gives you a rough idea of which metrics are generating the most unique time series.

Slow or Timing-Out Queries

PromQL is powerful, but not always fast. If dashboards or alerts feel sluggish:

  • Limit the query range. Avoid asking for 30-day data on graphs meant to show 5-minute trends.
  • Use recording rules to precompute anything complex that runs frequently.
  • Monitor query execution times in the Prometheus UI under the “/graph” tab or enable query logging.

Prometheus isn’t optimized for ad-hoc exploration at massive scale—optimize for what you need, not everything you can collect.

Storage Filling Up Too Fast

Running out of disk? Prometheus stores data in block files that get compacted periodically, but high ingestion rates and long retention can fill storage quickly.

To fix or prevent issues:

  • Check your retention settings. Use flags like --storage.tsdb.retention.time=15d to control how long data is kept.
  • Enable remote_write to ship data to external long-term storage (especially useful for compliance or historical analysis).
  • Monitor disk I/O and latency. If compaction is falling behind, you might need faster disks or to reduce scrape frequency.

Wrapping Up

Prometheus logging flips the script. Instead of digging through logs, you track the metrics that matter, clean, queryable, and built for scale.

And when things get complex? High cardinality, long-term storage, rising costs—Last9 makes Prometheus production-ready, without the overhead.

Just better observability. Book sometime with us to know more or get started for free today!

FAQs

What is Prometheus logging?
Prometheus logging refers to capturing structured metrics instead of unstructured log lines. It enables time-series analysis using metrics exposed via HTTP endpoints, rather than writing logs to files.

Is Prometheus similar to Splunk?
Not really. Splunk ingests unstructured log data for indexing and searching. Prometheus focuses on numeric, time-series metrics optimized for monitoring and alerting, not log aggregation.

What events does Prometheus log?
Prometheus doesn't log individual events like a traditional log aggregator. Instead, it captures metrics that reflect system state—things like request counts, response times, or error rates sampled at regular intervals.

If you want to track specific events (e.g., user signups, payment failures), you expose them as structured metrics using counters or labeled gauges. This makes it easier to query and alert on patterns without dealing with raw logs.

For deeper event-level visibility, Prometheus is often paired with tools like Grafana, Loki, or Last9, which provide log-level detail and full-stack observability alongside metrics and traces.

What does Prometheus track?
It tracks any numeric data exposed as metrics: CPU usage, request latency, memory consumption, application-specific counters, histograms, and gauges.

What is the difference between Grafana and Prometheus logging?
Prometheus collects and stores time-series metrics. Grafana visualizes those metrics through dashboards. Prometheus is the backend; Grafana is the frontend.

When to use Prometheus?
Use Prometheus when you need real-time monitoring, alerting, and trend analysis of structured metrics, especially in Kubernetes or microservice environments.

Should I use Prometheus as a log aggregator?
No. Prometheus is not built for raw log ingestion. Use it for metrics. For log aggregation, consider tools like Grafana Loki or Elasticsearch.

Is anyone using Grafana for their network monitoring?
Yes, many teams use Grafana with Prometheus or other data sources to visualize network metrics like bandwidth, packet loss, and latency.

What Metrics Do You Use for Alerts?
Common alerting metrics include error rates, latency percentiles, request throughput, CPU/memory usage, and custom business SLIs like checkout failures.

How does Grafana Loki work?
Loki is a log aggregation system that works like Prometheus, but for logs. It indexes log metadata (labels) and streams logs for querying via LogQL.

How does Prometheus monitoring work?
Prometheus scrapes metrics from instrumented applications via HTTP endpoints, stores them in a time-series database, and exposes them for querying and alerting with PromQL.

How can I integrate Prometheus with logging tools?
You can correlate Prometheus metrics with logs by using tools like Grafana (with Loki) or structured logging libraries that expose metrics alongside logs.

How can Prometheus be used for logging and monitoring?
While it’s not a log tool, Prometheus provides monitoring through metrics. You can emulate log-style signals using counters and labels for event tracking.

Authors
Prathamesh Sonpatki

Prathamesh Sonpatki

Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

X

Contents

Do More with Less

Unlock high cardinality monitoring for your teams.