An Easy Guide to Getting Started with Elastic APM

Code in production will break. Maybe a request takes too long, maybe it fails quietly, or maybe it works fine one minute and falls over the next. Logs can help, sure—but they don’t always show the full picture, especially when performance issues are involved.

Elastic APM gives you a clearer view. It traces what your application is doing from incoming requests to database queries and everything in between. You get latency breakdowns, error details, and performance metrics tied directly to the code paths that caused them.

This blog walks through how it works, what kind of data you get, and how to set it up without adding unnecessary overhead.

What is Elastic APM?

Elastic APM is a performance monitoring system that’s part of the Elastic Stack. It helps you understand how your code behaves in production by collecting performance metrics, traces, and error data from your applications in real time.

Unlike traditional monitoring, which might just tell you CPU usage or uptime, Elastic APM goes deeper. It tracks individual requests, highlights slow database queries, and captures stack traces for errors—so you can see where and why things are going wrong.

It works by adding language-specific agents to your app. These agents gather performance data and send it to the APM Server, which then pushes it into Elasticsearch. From there, you can explore it using Kibana’s dashboards or build your views.

💡

If you're wondering how Elastic APM fits into the bigger picture of system monitoring, here's a straightforward comparison between APM and observability that clears things up.

Breakdown of Elastic APM Components

APM Agents

APM agents are libraries you add to your app to collect telemetry—no separate daemon or sidecar needed. Elastic supports Java, .NET, Node.js, Python, Go, Ruby, and PHP.

Once integrated, agents automatically capture:

Transactions – like HTTP requests or background jobs
Spans – internal operations like DB queries or external API calls
Errors – uncaught exceptions, stack traces, and related metadata

Most agents work with minimal setup—often just a few lines of code to start collecting data.

APM Server

This sits between your app and Elasticsearch. Agents send raw data to the APM Server, which validates, enriches, and forwards it.

You can run the server standalone, in Docker, Kubernetes, or as part of Elastic Cloud. It’s built to handle large volumes of trace data without introducing extra latency.

Elasticsearch + Kibana

All your APM data ends up in Elasticsearch. You get full-text search, filtering, and aggregation on traces, spans, and errors—so you can dig deep without writing complex queries.

Kibana is the UI layer. Use the built-in APM dashboards or build your own to slice data by service, endpoint, latency, and more. Great for debugging, and even better for showing your team why something is slow.

💡

To understand where Elastic APM stands in relation to OpenTelemetry and what each brings to the table, check out this breakdown of OpenTelemetry and APM.

Key Features That Make Elastic APM Stand Out

Distributed Tracing

Following a single request across multiple services is painful without proper tooling. Elastic APM makes this easier by stitching together the entire trace, from entry point to final response.

Visualizes the full path of a request across services
Breaks down latency by span, so you can isolate bottlenecks
Makes debugging slow or inconsistent endpoints much faster

Real User Monitoring (RUM)

Server metrics don’t tell you how fast the app feels. RUM captures what users experience in the browser.

Tracks page load times, asset performance, and frontend errors
Works across browsers, devices, and network conditions
Helps connect backend behavior to frontend impact

Error Tracking and Stack Traces

When things break, Elastic APM doesn’t just log the error; it adds context.

Captures full stack traces and exception details
Group recurring errors to cut through the noise
Includes session, environment, and request data to speed up debugging

Database and External Service Monitoring

Dependencies are often where performance issues hide. Elastic APM keeps an eye on them too.

Automatically tracks calls to databases and third-party services
Surfaces slow queries, failed requests, and timeout patterns
Supports out-of-the-box instrumentation for common frameworks

💡

For a deeper look at how APM fits into the broader observability toolkit, this guide on APM and observability connects the dots clearly.

How to Get Started with Elastic APM

Setting up Elastic APM involves a few moving parts, but it’s fairly straightforward once you break it down. You’ll need to:

Start an APM Server
Install and configure an APM agent in your app
Make sure everything connects correctly

Let’s understand each step.

1. Deploy the APM Server

The APM Server acts as the middleman between your application and Elasticsearch. It receives telemetry data from your app (via the agent), processes it, and forwards it to Elasticsearch for storage and analysis.

You can run the APM Server in a few ways:

Locally (useful for testing or dev environments)
Docker
Kubernetes
Elastic Cloud (fully managed by Elastic)

Here’s a quick way to start it using Docker:

docker run \
  --name=apm-server \
  --env=ELASTIC_APM_SERVER_URL=http://localhost:8200 \
  -p 8200:8200 \
  docker.elastic.co/apm/apm-server:8.13.0

You’ll also need Elasticsearch and Kibana running. If you’re testing locally, Elastic provides a docker-compose file that spins everything up at once.

2. Install the APM Agent in Your Application

The APM agent is a small library you add to your codebase. It automatically instruments common frameworks and libraries, captures performance data, and sends it to the APM Server.

For example, in a Node.js app:

Install the agent:

npm install elastic-apm-node

Then, add this as the very first line in your main entry file (before any other imports):

const apm = require('elastic-apm-node').start({
  serviceName: 'my-node-app',
  serverUrl: 'http://localhost:8200',
  environment: 'development'
})

Replace serviceName with something meaningful (like checkout-service) and point serverUrl to wherever your APM Server is running.

💡

For other languages (Java, Python, Go, etc.), the setup is similar but agent-specific. Elastic provides detailed language-specific setup guides.

3. Verify the Data Flow

Once your app is running, the agent should start sending data to the APM Server. To confirm:

Open Kibana
Go to APM > Services
You should see your app listed
Click into it to view traces, transactions, errors, and dependencies

If nothing shows up:

Check APM Server logs for connection issues
Make sure you’ve restarted the app after adding the agent
Double-check that the APM Server URL is accessible

Configuration Best Practices

You’ll get data with the default setup, but here are a few tweaks to make it production-ready:

Sampling rate: For high-traffic apps, set a transaction sampling rate (e.g., 0.2 = 20%) to reduce noise and resource usage.
Consistent naming: Use predictable service names (auth-service, user-api) and environments (staging, prod) to make filtering easier.
Secure your APM Server: In production, always enable HTTPS and authentication on your APM Server.
Custom labels/tags: Add metadata (e.g., feature flags, deployment IDs, region) to traces for better context when debugging.

Monitoring Multiple Services

If you're running a distributed system or microservices, Elastic APM can trace requests across multiple services using a shared trace context.

Make sure all services use APM agents that support distributed tracing.
Use consistent headers (traceparent) across HTTP calls to propagate trace info.
Use service maps in Kibana to visualize how services are connected and identify slow hops between them.

This is especially useful when you're trying to understand performance issues that span multiple backends or dependencies.

💡

If you're setting up Elastic APM, you might also want to explore this comparison of top application logging tools to round out your observability stack.

Advanced Elastic APM Features

Once your basic setup is working, you’ll probably want to get more value out of the data you're collecting.

This section covers the kind of advanced features that help you debug faster, track key workflows, and even identify problems before users do.

Frontend Debugging: Using Source Maps

If you're shipping minified JavaScript (which you probably are), then your stack traces in production won't make much sense. Source maps fix that by linking the minified code back to your original files, so instead of a stack trace pointing to bundle.js:2:20493, you’ll see the actual line in your source code that caused the error.

To get this working in Elastic APM:

Upload the maps to Elastic APM (there’s a CLI for this, or you can use their API).

Configure your build tool (Webpack, Vite, etc.) to generate source maps:

devtool: 'source-map',

Start by adding version tracking to your RUM setup:

import { init as initApm } from '@elastic/apm-rum';
const version = require('./package.json').version;

const apm = initApm({
  serviceName: 'my-web-app',
  serviceVersion: version
});

Once it’s set up, your frontend errors in Kibana will show clean, readable stack traces that point to your source files, not your minified bundles.

Custom Instrumentation: When You Need More Than Defaults

The built-in agents track most of the common stuff: HTTP routes, database calls, and exceptions. But what if you want to track the duration of your signup flow? Or monitor how long it takes to process an image upload?

You can create your transactions and spans using the APM agent's API. Here’s an example in Node.js:

const tx = apm.startTransaction('signup-flow');

const validation = tx.startSpan('email-validation');
// ... validate email
validation.end();

const write = tx.startSpan('insert-user-to-db');
// ... write to DB
write.end();

tx.end();

Now you’ve got a custom trace in Kibana showing exactly how long each part of the signup flow takes. It’s useful for spotting slow logic or debugging intermittent slowdowns.

💡

Last9 includes full monitoring support with alerting and notifications—and with Alert Studio, it tackles common issues like poor coverage, alert fatigue, and noisy cleanup workflows.

Alerting and Anomaly Detection

Once you’ve got good data flowing into Elastic, setting up alerts is the next step. You don’t want to manually check dashboards to know something’s wrong.

You can configure alerts for:

High error rates
Spikes in response time
Slow database queries
Dropping throughput

Elastic also offers anomaly detection powered by machine learning. That can help catch things like gradually increasing response times that might not trigger a static threshold, but are still a problem.

These alerts can be routed to Slack, PagerDuty, or wherever your team tracks issues.

CI/CD Integration: Catching Regressions Before Users Do

Performance issues often show up right after a deploy. With Elastic APM, you can bake performance checks into your release pipeline.

Some teams do things like:

Compare key metrics before and after a deploy
Fail a build if latency jumps or error rates go up
Use APM data to trigger automatic rollbacks

You can query Elasticsearch via scripts or APIs to pull the data you care about, and wire that into whatever CI/CD system you use.

How Much Does APM Cost in Resources

Instrumenting your application for observability always comes with a trade-off—you're collecting useful data, but that data has to be gathered, processed, and shipped somewhere. So, how much overhead does Elastic APM introduce?

Agent Overhead

Elastic APM agents are built to be lightweight. Most setups see less than a 5% increase in CPU usage and little to no impact on memory. That’s because:

Agents use non-blocking, asynchronous calls to send data
Sampling is enabled by default, so not every transaction is traced
Most agents are smart enough to avoid redundant instrumentation

You can tune the configuration based on your traffic patterns. If you're running a high-throughput system, you may want to lower the sample rate further or disable some integrations to reduce overhead.

Managing Data Volume

APM data can grow fast, especially in production environments with lots of traffic. If you’re not careful, you’ll end up storing huge volumes of trace and span data that no one is using.

Here’s how to keep it under control:

Sampling: Set a sampling rate that balances visibility with cost. For example, sampling 10–20% of transactions is often enough to catch most issues.
Retention policies: Decide how long to keep full-fidelity data vs. aggregated metrics. You might keep detailed traces for 7 days and roll up metrics beyond that.
Index lifecycle management (ILM): Elasticsearch supports ILM policies to automate data aging—move older data to cheaper storage or delete it entirely.

The idea is to prioritize what’s actionable. If you're only checking detailed traces when an incident happens, keep them short-lived. Long-term trend data can stick around for capacity planning or SLA reviews.

Probo Cuts Monitoring Costs by 90% with Last9

Fixing Common Elastic APM Setup and Runtime Issues

1. When Agents Can’t Reach the APM Server

One of the first things to check during setup is network connectivity between your application and the APM Server. If the agent can’t send data, nothing else works. Make sure:

The serverUrl in your agent config, is correct.
The port (usually 8200) is open and reachable.
If you're using HTTPS, verify your SSL certificates—self-signed certs in dev environments are often a pain point.

Enabling debug logs in the agent can reveal whether it's failing silently or dropping data due to network errors.

2. No Data in Kibana? Here’s What to Check

Once agents are running, you expect to see data in Kibana. If nothing’s there, don’t panic. A few common causes:

Sampling rate is too low: Especially in test environments with light traffic, a low sample rate might mean you’re just not seeing enough transactions.
Service name mismatches: Even small inconsistencies (myService vs my-service) can cause data to appear in unexpected places, or not at all.
Agent misconfiguration: Missing environment variables or typos in the config file can silently break data collection.

3. APM Slowing Down Your App? Here's How to Dial It Back

APM is supposed to help you debug performance but if you notice latency or CPU spikes after adding an agent, it’s time to adjust your configuration.

Reduce sampling rate for high-throughput services.
Disable modules you don’t need—many agents auto-instrument everything by default.
Filter out noisy spans like health checks or cache lookups that don’t add value.

Also, keep an eye on compatibility if you’re using legacy frameworks. Some older libraries may not play well with default instrumentation settings.

Elastic APM Integration Strategies

Integration Type	Benefits	Considerations
Container Deployments	Easy scaling, consistent configuration	Requires container orchestration knowledge
Cloud Services	Managed infrastructure, automatic scaling	Vendor lock-in, data transfer costs
Hybrid Environments	Flexibility, gradual migration	Complex networking, security considerations
Development Environments	Early issue detection, performance testing	Resource usage, configuration management

💡

Now, fix production APM issues instantly right from your IDE, with AI and Last9 MCP. Bring logs, metrics, and traces—into your local environment to debug and optimize code faster.

Practical Tips to Get More Out of Elastic APM

Elastic APM can be incredibly useful, but only if it’s implemented in a way that your team uses. The goal isn’t just data collection; it’s better decisions, faster debugging, and fewer surprises in production.

Here’s how to get there.

Begin by instrumenting your core services and critical user journeys, things like authentication, checkout flows, or search endpoints. These areas tend to have the biggest impact on user experience.

Once you’re comfortable with the basics, expand coverage to less critical features. You can gradually enable advanced options like custom spans, user context, or release tracking without overwhelming your team early on.

To make the most of the data you collect:

Create alerting workflows: Knowing something’s wrong is only half the battle. Build lightweight runbooks or Slack playbooks for what to do when response times spike or error rates increase.
Document your tagging conventions: This helps avoid data silos and keeps dashboards clean as your service footprint grows.
Review sampling and retention regularly: What mattered six months ago might be noise now. Keep your config aligned with your app’s current priorities.

And don’t forget to:

Train your team: APM only helps if engineers use it. Host walkthroughs, record short Loom videos, or pair program while reviewing traces.
Use the data during postmortems: Point to spans and transaction traces in retro meetings to build stronger context and more accurate root cause analysis.

Wrapping Up

Elastic APM gives you solid tooling to understand what’s going on inside your application—tracing requests, surfacing slow database queries, and helping you connect the dots when performance dips. It's a good choice if you're already in the Elastic ecosystem and don’t mind managing your infrastructure.

But if you're scaling fast, dealing with high-cardinality metrics, or just want something that works out of the box without the overhead of managing servers, agents, and storage, Last9 might be a better fit.

We offer a managed observability platform that handles metrics, logs, and traces without you having to wrestle with config files and retention settings. It's built for engineers who’d rather spend time improving systems than debugging their monitoring stack.

Get started with a free trial or book sometime with us to know more about the platform!

FAQs

How much overhead does Elastic APM add to my application?

Elastic APM agents typically add less than 5% CPU overhead and minimal memory usage. The agents use efficient sampling and asynchronous data transmission to avoid impacting your application's performance.

Can I use Elastic APM with microservices?

Yes, Elastic APM excels at monitoring microservices architectures. Its distributed tracing capabilities follow requests across multiple services, helping you identify bottlenecks and understand service dependencies.

What's the difference between Elastic APM and traditional logging?

Traditional logging captures discrete events, while Elastic APM provides structured performance data with automatic instrumentation. APM shows you transaction flows, performance metrics, and user experience data that logs alone can't provide.

How long does it take to set up Elastic APM?

Basic setup takes about 30 minutes for most applications. You'll need to install APM Server, add an agent to your application, and configure data collection. More complex setups with custom instrumentation may take longer.

Do I need to modify my application code to use Elastic APM?

Most APM agents provide automatic instrumentation with minimal code changes. You typically need to add just a few lines of configuration code to start collecting basic performance data.

Can Elastic APM monitor database performance?

Yes, Elastic APM automatically detects and monitors database queries, showing you slow queries, connection issues, and database response times without additional configuration.

How does Elastic APM handle sensitive data?

APM agents can be configured to filter sensitive data before transmission. You can exclude specific fields, sanitize SQL queries, and control which data gets sent to your monitoring infrastructure.