Vibe monitoring with Last9 MCP: Ask your agent to fix production issues! Setup →
Last9 Last9

How to Monitor and Manage Grafana Memory

Understand how to monitor and manage Grafana memory usage to keep your dashboards running smoothly and avoid crashes or slowdowns.

Jun 3rd, ‘25
How to Monitor and Manage Grafana Memory
See How Last9 Works

Unified observability for all your telemetry.Open standards. Simple pricing.

Talk to us

It’s late, you get an alert, and Grafana is down. The reason? It ran out of memory. If you’ve ever watched Grafana slowly eat up RAM until it just stops responding, you know how frustrating that can be.

Memory can spike quickly, especially with complex dashboards and multiple data sources. This guide will help you understand what’s going on and how to keep Grafana running without surprises.

Why Grafana’s Memory Use Can Get Out of Hand

Grafana isn’t just showing charts, it’s constantly querying data sources, processing results, caching data, and managing real-time connections.

You might start with a simple dashboard using around 100MB of memory. Then you add more panels, create alerts, and connect extra data sources, and suddenly, Grafana is using several gigabytes.

The issue gets worse when multiple users access different dashboards at the same time. Each user session keeps its state, so with many panels and users, memory use grows quickly.

💡
If you want to learn more about managing Grafana access and authentication, check out our detailed guide on Grafana login and security!

How to Monitor Grafana’s Memory Use

The first step is to keep an eye on how much memory Grafana is using. Grafana exposes detailed metrics you can check anytime at the /metrics endpoint.

Here’s a quick way to get a snapshot of your Grafana server’s memory usage:

curl http://your-grafana-instance:3000/metrics | grep memory

Running this command will return output like this:

go_memstats_alloc_bytes 2.5165824e+08
go_memstats_sys_bytes 7.3400328e+08
process_resident_memory_bytes 4.1234432e+08

Let’s break down what each of these means:

  • go_memstats_alloc_bytes: This shows how much memory Grafana has currently allocated and is actively using—in this example, about 251 MB. Consider it as the working set of memory that Grafana needs to keep things running.
  • go_memstats_sys_bytes: This is the total amount of memory Grafana has requested from the operating system, which here is roughly 734 MB. It includes both the actively used memory and some overhead reserved for future needs.
  • process_resident_memory_bytes: This tells you how much physical RAM the Grafana process is holding at this moment—about 412MB in the example. This number can fluctuate depending on workload and garbage collection.

If you want to keep a continuous eye on these numbers, you can set up Prometheus to scrape the /metrics endpoint and create alerts when memory usage crosses a threshold. This way, you’ll catch spikes early—before your dashboards go down or slow to a crawl.

Monitoring memory regularly helps you understand patterns: Is usage climbing steadily over days? Does it spike when certain dashboards load? Are specific data sources causing heavier loads? These insights are key to knowing when and where to optimize.

💡
For a practical look at how to work with Grafana beyond the UI, including automating tasks through its API, take a look at our getting started guide.

Grafana Memory vs System Memory: What’s the Difference?

Tracking Grafana’s memory is important, but you also need to monitor the system’s total memory usage and how much is free for applications like Grafana.

If you’re using Prometheus with node_exporter, you can collect system-level metrics alongside Grafana-specific ones.

Here are a few PromQL queries to give you a fuller picture:

# Total system memory usage as a percentage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) 
/ node_memory_MemTotal_bytes * 100

This shows how much of the system’s memory is in use, across all processes.

# Memory still available for user-space applications
node_memory_MemAvailable_bytes

This is more useful than just looking at MemFree since it accounts for memory used in disk caches that the system can reclaim if needed.

# Memory used by the Grafana process
process_resident_memory_bytes{job="grafana"}

This gives you the memory footprint of Grafana itself—what it’s holding in physical RAM.

How to Set Basic Alerts to Track Memory Usage

Here’s an example of how to define memory-related alerts for Grafana using Prometheus alerting rules:

groups:
  - name: grafana_memory
    rules:
      - alert: GrafanaHighMemoryUsage
        expr: process_resident_memory_bytes{job="grafana"} > 1e+09  # 1GB
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Grafana memory usage is high"

      - alert: GrafanaMemoryGrowth
        expr: increase(process_resident_memory_bytes{job="grafana"}[1h]) > 1e+08  # 100MB increase in 1h
        for: 15m
        labels:
          severity: critical
        annotations:
          summary: "Grafana memory usage is growing rapidly"

These two alerts work together: one catches when memory use crosses a hard threshold, and the other warns you when usage is steadily increasing, often a sign of memory leaks or dashboards that are getting too heavy.

But to understand behavior, you need to observe how memory changes over time. Prometheus makes this easy:

# Average memory usage over the past hour
avg_over_time(process_resident_memory_bytes{job="grafana"}[1h])

Use this to spot slow, creeping memory growth that might not trigger alerts but could lead to problems later.

# How fast memory usage is changing over 24 hours
rate(process_resident_memory_bytes{job="grafana"}[24h])

Great for detecting long-term trends or load-related spikes.

# Highest memory usage in the last day
max_over_time(process_resident_memory_bytes{job="grafana"}[24h])

This helps you answer questions like: Has Grafana ever crossed 1.5GB? If so, when?

💡
To make your Grafana dashboards more dynamic and easier to manage, see our guide on using Grafana variables effectively.

Advanced Memory Monitoring Techniques

Basic memory stats show how much Grafana is using, but not why or when usage increases. To get the full picture, you need to look at container memory, Grafana’s activity, and overall system memory.

If You're Running Grafana in a Container

If Grafana runs in a container (Docker or Kubernetes), it doesn’t have access to the full system memory. The container has limits, and if Grafana crosses them, it can get killed even if the host system has RAM to spare.

Here's how to keep an eye on that:

# Memory Grafana is using inside the container
container_memory_usage_bytes{name="grafana"}

# Memory limit assigned to the container
container_spec_memory_limit_bytes{name="grafana"}

# Usage as a percentage of the limit
(container_memory_usage_bytes{name="grafana"} / container_spec_memory_limit_bytes{name="grafana"}) * 100

# Memory actively in use, excluding page cache
container_memory_working_set_bytes{name="grafana"}

If you're close to the limit (say 90% or more), it's time to consider increasing the limit or trimming down the workload, especially if you're seeing OOM kills.

What’s Contributing to Memory Growth?

Grafana doesn’t give a detailed memory breakdown by feature, but its activity can be a strong signal. If your dashboards are querying a lot, running alerts, or loading multiple panels, that adds up fast.

Here are some useful proxy metrics:

# Tracks number of queries to data sources
grafana_datasource_request_total

# Total number of dashboards loaded
grafana_dashboards

# Number of times alert rules were evaluated
grafana_alerting_rule_evaluations_total

If memory usage rises alongside spikes in these metrics, there’s your clue. For example, an alert rule with a wide query range or high frequency can use a surprising amount of memory.

Don’t Ignore the Host System

Sometimes Grafana’s memory issues aren’t caused by Grafana. If the host system is running out of memory or swapping heavily, Grafana might slow down or get killed even if its usage looks normal.

On Linux:

# Overall system memory usage (%)
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100

# Actual free memory available to applications
node_memory_MemAvailable_bytes

# Swap usage (can indicate memory pressure)
node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes

If you're seeing high swap usage or consistently low MemAvailable, you're likely pushing the system too hard.

On Windows:

# System memory usage in %
(windows_memory_physical_total - windows_memory_physical_available) / windows_memory_physical_total * 100

# Free memory available
windows_memory_physical_available

These help you spot if the root cause is at the OS level, not Grafana itself.

💡
If you’re running Grafana in containers, our post on Grafana and Docker covers key tips for managing resources and performance.

The Usual Suspects Behind High Memory Use

Here are some of the most common culprits behind memory blowups in Grafana—no theory, just real-world patterns that many teams run into.

1. Dashboards With Too Many Panels

Let’s say you’ve built a big "overview" dashboard: CPU, memory, disk, network, app health, all in one place. Sounds useful, right? But now you’ve got 40+ panels, and each one is firing off its query every 30 seconds.

If you’re monitoring 20 servers, and each panel touches time series data for each host, you’re suddenly dealing with hundreds—sometimes thousands—of active series per refresh.

Here’s a (simplified) idea of what that looks like:

{
  "dashboard": {
    "panels": [
      {
        "title": "CPU Usage by Host",
        "targets": [
          {
            "expr": "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)"
          }
        ]
      }
      // ...plus 39 more panels just like this
    ]
  }
}

With 20 hosts and 40 panels, that’s 800+ queries every 30 seconds. Each one eats up memory for caching, rendering, and storing state per user session. Now add a few users refreshing that dashboard at the same time, and memory usage can spike into the gigabytes.

2. Heavy Queries That Pull Too Much Data

Even if you don’t have many panels, a single expensive query can hog memory fast.

For example, here’s a query that calculates the 95th percentile response time for all services, grouped by HTTP method:

histogram_quantile(0.95, 
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service, method)
)

Looks fine, until you realize what it’s doing:

  • 24-hour time range at 15s resolution = 5,760 data points per series
  • 50 services × 10 methods = 500 series
  • That’s nearly 3 million data points processed just for this one panel

Now, imagine running that across multiple dashboards or users. Without limits, Grafana will happily try to handle it all—until memory runs out.

3. Misconfigured Data Source Connections

Here’s the issue: the data source itself may be leaking memory because of bad connection settings.

Example from a real config:

database:
  max_open_conns: 1000
  max_idle_conns: 1000
  conn_max_lifetime: 0  # Never closes connections

Each open connection uses memory, even idle ones. If Grafana keeps hundreds or thousands of them open with no timeout, memory usage grows continuously. This especially hurts when dashboards pull from multiple sources (e.g., Prometheus, PostgreSQL, Elastic, etc.) at once.

💡
Now, fix Grafana memory issues instantly—right from your monitoring setup, with AI and Last9 MCP. Bring real-time production context — logs, metrics, and traces — into your environment to spot and solve problems faster.

How Different Parts of Grafana Use Memory

Here's what you can expect from different parts of your Grafana setup:

ComponentLight UsageHeavy UsageWhat Causes Heavy Usage
Base Grafana Process100-200MB500MB+Many plugins, complex configurations
Per Dashboard Panel5-15MB50-100MB+Complex queries, large datasets, long time ranges
Data Source Connection10-30MB100MB+Poor connection pooling, too many concurrent connections
Alert Engine50-100MB200-500MB+Hundreds of alert rules, complex expressions
User Sessions1-5MB per user20-50MB+ per userMultiple open dashboards, real-time panels

How to Track Down Memory Issues in Grafana

When Grafana’s memory usage unexpectedly increases, it can be challenging to identify the cause. These steps will help you investigate and pinpoint the source of the problem.

Step 1: Enable Detailed Logging

Start by increasing the log verbosity to capture more information about memory-related events. Add the following to your grafana.ini configuration file:

[log]
level = debug
mode = console

[log.console]
level = debug
format = json

With detailed logging enabled, review the logs for any messages related to memory, cache, or query issues by running:

docker logs grafana 2>&1 | grep -i "memory\|cache\|query" | tail -20

Look for repeated warnings or errors that could indicate memory problems.

Step 2: Use Grafana’s Built-In Profiler

If the logs don’t provide enough information, use Grafana’s built-in profiling tool to get a detailed view of memory usage.

Enable profiling by updating your configuration with:

[feature_toggles]
enable = live

[server]
enable_pprof = true

After restarting Grafana, capture a memory profile using this command:

go tool pprof http://localhost:3000/debug/pprof/heap

This profiling data will help you understand which parts of Grafana are consuming the most memory.

Monitor Query Performance

Check which queries are consuming the most resources by examining Grafana's query history. You can do this through the Query Inspector in any panel or by monitoring the grafana_datasource_request_duration_seconds metric.

Conclusion

Memory optimization is a continuous process. It’s not about cutting memory usage to the bone, but finding the right balance between performance, features, and resources.

At Last9, we run hosted Grafana ourselves, so we know the memory challenges up close. Beyond Grafana, we also provide our logs and traces UI. But if you want, you can use your own dashboards and alerting tools.

Whether you choose to stick with Last9 or mix and match, we give you the flexibility to build an observability setup that works for you. Get started with us today!

FAQs

What's normal Grafana memory usage for a small team?

For a small team with 5-10 users and basic dashboards, expect 200-500MB of memory usage. This includes the base Grafana process plus typical dashboard caching. If you're seeing much higher usage, check for overly complex queries or too many auto-refreshing panels.

How can I tell which dashboards are using the most memory?

Enable Grafana's metrics endpoint and look for grafana_dashboard_* metrics. You can also use the query inspector in each panel to see query execution times and result sizes. Slow queries often correlate with high memory usage.

Should I use SQLite or PostgreSQL for Grafana's database?

SQLite works fine for development and small deployments, but PostgreSQL is better for production. PostgreSQL handles concurrent connections better and supports features like high availability. The database choice doesn't significantly impact memory usage, but it affects overall performance.

Can I run multiple Grafana instances with the same configuration?

Yes, but you'll need to use an external database (PostgreSQL or MySQL) and configure session storage properly. Each instance will have its memory footprint, so this is a good way to distribute load while maintaining the same dashboards and data sources.

What happens when Grafana runs out of memory?

Grafana will become unresponsive, queries will timeout, and in container environments, the process might get killed by the OOM killer. Users will see error messages, and dashboards won't load. That's why setting up proper monitoring and alerts for memory usage is so important.

How often should I restart Grafana to clear memory?

You shouldn't need to restart Grafana regularly if it's properly configured. If you find yourself restarting frequently due to memory issues, there's likely an underlying problem like a memory leak, poor query optimization, or insufficient resources. Focus on identifying and fixing the root cause rather than working around it with restarts.

Authors
Anjali Udasi

Anjali Udasi

Helping to make the tech a little less intimidating. I love breaking down complex concepts into easy-to-understand terms.

Contents

Do More with Less

Unlock high cardinality monitoring for your teams.