Guide — How to Choose the Right OpenTelemetry Backend

Your collectors are running as planned - batching telemetry, applying processors, and exporting data with steady performance. Applications are instrumented, and pipelines are in place. The next step is choosing the backend that will store, analyze, and visualize this data.

This decision shapes query speed, cost behavior, team workflows, and the kind of analysis you’ll use to troubleshoot distributed systems. The advantage is that you don’t have to lock yourself in too early.

With OpenTelemetry, the same instrumentation works across different backends, so switching later remains an option.

In this part of the OTel series, we’ll look at the key factors that go into backend selection - from platform design choices and cost models to preparing for growth.

Types of OpenTelemetry Backends

When you’re evaluating OpenTelemetry backends, the key distinction is whether the system was designed for OTel from day one or whether OTel support was added later. Both categories can work, but they offer very different experiences in terms of data fidelity, ecosystem maturity, and operational trade-offs.

1. OpenTelemetry-Native Platforms

These backends are built around OpenTelemetry at their core. They ingest OTLP directly, preserve semantic conventions end to end, and model storage and querying around the same standards you use in instrumentation. If you’ve added custom attributes, resource labels, or followed OpenTelemetry’s semantic conventions carefully, these platforms ensure all of that effort carries forward into your queries. You don’t need to worry about schema translation or attribute loss.

Examples you’ll see in practice: Last9, Uptrace, Dash0

Strengths:

Semantic fidelity: Attributes, resource labels, and custom fields you emit remain intact and queryable. For example, if you tag spans with user_id or region, you can filter and correlate on those directly without schema translation.
Usage-based pricing: Costs scale in proportion to telemetry volume rather than infrastructure size. If you emit more spans or metrics, pricing follows data throughput instead of host count.
Fast OTel adoption: Because these platforms align closely with the spec, features like exemplars, log signal support, or histogram temporality modes become available quickly without custom extensions.

Considerations:

Focused ecosystems: These platforms concentrate on observability built around OpenTelemetry. They don’t attempt to cover every adjacent use case (like CI/CD analytics or compliance tooling), but they integrate cleanly with specialized systems when you need those layers.
Purpose-built scope: By design, the emphasis stays on OpenTelemetry standards, semantic fidelity, and portability. If you want broader enterprise workflows in the same tool, you’ll likely combine these platforms with complementary solutions.

OTel-native backends are ideal when semantic fidelity, vendor independence, and future portability are your primary concerns.

2. OTel-Compatible Platforms

These systems weren’t designed around OpenTelemetry initially, but now ingest OTLP and integrate it into their pipelines. This group includes both open-source backends (like Prometheus, Loki, Tempo, and Elasticsearch) and established commercial vendors (like Datadog, New Relic, and others). These platforms have been battle-tested in production environments for years and offer comprehensive feature sets that extend well basic observability.

For open-source backends, OTel support usually comes in the form of OTLP ingestion endpoints that convert incoming data into system-native formats. For example, Prometheus will accept OTLP metrics via remote-write but stores them in its own time-series schema.

For commercial vendors, OTel compatibility typically means you can send OTLP directly to their SaaS platform. From there, the data may be extended with vendor-specific fields or indexed into proprietary storage systems.

The upside is that you get comprehensive dashboards, anomaly detection, access controls, and workflow automation without building those layers yourself.

Strengths:

Broader feature bundles: These platforms package observability together with security monitoring, compliance features, and workflow automation — capabilities that grew out of their longer history as general-purpose monitoring suites.
Extensive integration catalogs: Open-source options like Prometheus have exporters for nearly every database, queue, or cloud service, while vendors like Datadog offer deep integrations with hundreds of enterprise tools and runtimes.
Proven deployments: Systems in this group have years of operational history and large-scale production use across industries. Their scale is well documented — though it’s tied to proprietary formats and operational models rather than native OTel standards.
Advanced platform features: Some vendors add layers such as ML-based anomaly detection, deployment correlation, or SLO dashboards. These can be valuable if you want packaged analytics, but they come at the cost of tighter vendor coupling.

Considerations:

Critical vendor lock-in risk: This is the most important consideration - once your OTel data is ingested, it’s converted into vendor-specific formats and stored in proprietary systems.
While your instrumentation code remains portable, your historical data, custom dashboards, alerting rules, and analytical workflows become deeply coupled to that vendor’s proprietary data models, query languages, and APIs. Migrating away often requires rebuilding your entire observability infrastructure and losing years of historical context.
Schema normalization: OTel data is often reshaped into backend-native formats. For example, Datadog transforms OTLP spans into its proprietary APM data model, which may alter how span relationships and custom attributes are stored and queried compared to the original OpenTelemetry semantic conventions.
Attribute handling: Some backends truncate long attribute keys or values, or drop rarely used fields. If you rely on detailed span attributes (like customer_id or payment_method), check that they remain queryable after ingestion.
Scaling differences: Each backend scales along different axes. Prometheus handles metrics well but requires federation to deal with high cardinality; Loki ingests logs at high volume, but wide queries can be slow.
Cost model variation: Vendors charge differently: Datadog ties cost to hosts and events, while New Relic charges by ingest size. The same 100k spans/sec workload can generate very different bills depending on the platform.

This category makes sense if you already have these systems in place or want to benefit from their existing ecosystems.

As you compare these two categories, ask yourself: do you need the semantic precision of a native platform, or the ecosystem depth of an OTel-compatible backend?

The answer often depends on how you’ve instrumented your systems, how much infrastructure you’re willing to operate, and what kind of analytical workflows your teams rely on.

Deployment Models: Hosted vs. Self-Hosted

While backend approaches describe how platforms handle OpenTelemetry data, deployment models describe where those platforms run and who manages them.

Deployment models determine who runs the backend and how much control you have over ingestion, storage, and query layers. Both hosted and self-hosted systems ingest OTLP - the distinction lies in operations.

Hosted Solutions: Managed Infrastructure, Faster Onboarding

With a hosted backend, the provider operates the full stack: ingestion endpoints, storage engines, and query layers. You configure your OpenTelemetry collectors to send data to a remote endpoint, and the vendor takes care of scaling and reliability.

Pre-configured ingestion: Collectors export OTLP directly; no custom adapters required.
Vendor-managed scaling: Ingestion nodes and storage clusters expand automatically during traffic spikes.
Optimized queries: Indexing and caching are tuned centrally, so queries remain responsive even under heavy load.
Built-in reliability: Replication, snapshots, and disaster recovery are handled for you.

Considerations as data grows:

Costs scale with data volume - more spans, metrics, or logs usually mean proportionally higher bills.
Pipeline flexibility is bounded - advanced routing or selective sampling may be limited to what the vendor supports.
Data residency depends on vendor regions, which matters if you have strict compliance or sovereignty requirements.

Examples: Datadog or New Relic (hosted OTel-compatible) or Last9 (hosted OTel-native),

Self-Hosted Solutions: Custom Pipelines, Full Ownership

In a self-hosted model, you deploy and manage the backend yourself - everything from ingestion and processing to storage and queries runs in your infrastructure. OpenTelemetry collectors forward data into systems you control.

Pipeline control: Define your own processors for filtering, sampling, and multi-backend routing.
Storage choices: Pair time-series databases like Mimir or VictoriaMetrics for metrics, columnar engines for traces, or hybrid layers for logs.
Retention tuning: Set different retention policies per signal type or namespace
Data sovereignty: Telemetry remains inside your VPC or datacenter, simplifying compliance audits.

Self-Hosted: The Cost Profile

Running your own backend can look cost-efficient at first glance, since infrastructure spend is often predictable and under your control. But the full picture includes more than hardware or cloud resources. Production-grade observability systems typically require dedicated engineering time for scaling ingestion pipelines, managing storage, applying upgrades, and troubleshooting. Depending on team size and setup, this can amount to a fractional or full engineer focused on observability operations.

Economics tend to shift with scale. At smaller data volumes, the additional operational overhead may outweigh the savings compared to hosted subscriptions. Once telemetry pipelines grow into multi-terabyte workloads, self-hosted platforms often become competitive — provided your team has the capacity and expertise to operate them reliably.

Technical Specs That Matter

When evaluating self-hosted options, here are the performance characteristics that determine real-world viability:

Jaeger: Handles approximately 2,000 spans per second with proper backend optimization, supports TraceQL for complex filtering. Storage requirements vary significantly based on span attributes and chosen backend (Elasticsearch vs Cassandra).
Prometheus + Grafana: Requires approximately 3KB per time series in memory, meaning ~3GB RAM for 1 million active time series. Requires federation for high-cardinality workloads.
Grafana Tempo: Uses object storage (S3/GCS) for cost-efficient long-term retention with a microservices architecture for horizontal scaling

Operational realities to plan for:

Scaling ingestion clusters: load balancers, sharding, and stream processors as data volume grows
Managing storage backends: compaction, index tuning, and query optimization for large datasets
Reliability engineering: HA setups, backups, disaster recovery - for the observability system itself
Team capacity: Upgrades, patching, and ongoing tuning require dedicated engineering time

Hosted solutions let you move quickly and reduce operational overhead, which is valuable when your focus is on application delivery rather than infrastructure management.

Self-hosted backends give you maximum control over data handling and compliance, but come with engineering responsibilities.

Data Fidelity and Performance at Scale

Most backends today can ingest OTLP, but the meaningful differences show up in how they handle the data once it arrives. A strong backend preserves attributes and semantic conventions end to end, keeps performance predictable under load, and integrates cleanly with the rest of your stack.

The way a platform implements OTLP support is often the best indicator of how much of your instrumentation effort will be reflected in queries and analysis.

Native OTLP

In this model, the backend preserves OpenTelemetry semantic conventions and data models throughout its storage and query pipeline. Semantic conventions, attributes, and resource labels flow through unchanged, so what you instrument is exactly what you can query. If you attach custom fields like customer_id to spans or ship metrics with high-cardinality labels, those remain available downstream.

Last9, Grafana Tempo, and Jaeger fall into this category. The advantage is fidelity — your instrumentation effort translates directly into query power. It’s also the cleanest option for supporting both gRPC and HTTP transports. The main consideration is performance at scale: you’ll want to benchmark query latency under your own workload to confirm how it holds up as volume and cardinality grow.

Gateway OTLP

Here, the backend ingests OTLP at the edge but then reshapes data into its own schema for storage. Most common attributes survive fine, but custom labels or less standard metric types can be normalized or transformed along the way.

Prometheus (with OTLP remote-write), Loki, and Elasticsearch often follow this model. Their appeal lies in mature ecosystems — dashboards, exporters, and community support make them easy to integrate into existing setups. The trade-off is fidelity: for example, Prometheus histograms use cumulative buckets, while OpenTelemetry supports both cumulative and delta temporality. That difference doesn’t break ingestion, but it can influence how you interpret or compare data during analysis. Always validate with your own telemetry rather than relying on vendor-provided demos.

Translated OTLP

Some platforms don’t ingest OTLP directly at all. Instead, they depend on proprietary agents or forwarders that translate OpenTelemetry data into the vendor’s native schema. Data fidelity in these systems depends largely on the translation layer that sits between your telemetry and the database.

Datadog Agents and New Relic forwarders are common examples. The draw here is convenience — bundled dashboards, features, and workflows work right out of the box. But if you depend on custom spans, unusual semantic conventions, or less common metric types, you’ll need to validate end-to-end that those attributes remain queryable after translation.

The safest way to evaluate compatibility is to validate with your own telemetry — not just a sample dataset. Pipe in your actual spans, metrics, and logs, then compare what comes out the other side.

Performance at Scale

A backend’s performance depends on the storage format, indexing strategy, and scaling model. These design choices dictate how queries behave as telemetry grows and whether the system holds up under high-cardinality workloads.

Columnar stores such as ClickHouse or Parquet-based systems are built for analytical queries. They can scan massive datasets efficiently and return complex aggregations with sub-second latency, even when queries span dozens of dimensions. This makes columnar stores a natural fit for trace analysis and multi-dimensional correlation queries where breadth matters as much as depth.

Time-series databases like Mimir, VictoriaMetrics, or TimescaleDB take a different approach. Optimized for append-only telemetry, they excel at high ingestion throughput and compact storage through delta encoding and block compression. They also provide time-aligned query functions such as rate(), sum_over_time(), and histogram_quantile(), which make them especially effective for metrics-heavy environments.

VictoriaMetrics, for instance, runs efficiently even with millions of active time series on a single node, while using less memory than Prometheus. For workloads where metrics volume dominates, this class of systems often delivers the best balance of performance and efficiency.

Row-based stores — whether relational databases or legacy log engines — handle structured data reliably, but they struggle with the cardinality and throughput requirements of modern observability. To keep queries fast, they usually need pre-aggregation, sampling, or carefully tuned indexes. As a result, they’re far less common in contemporary stacks, though they still appear in adapted setups where schema rigidity is an advantage.

Beyond storage format, backend performance also depends on ingestion, compression, and scaling characteristics. These are rarely captured in vendor benchmarks and need validation against your own telemetry.

Ingestion capacity varies: some systems scale ingestion within a single node (e.g., Prometheus), while others rely on sharded tiers or queueing layers such as Kafka (e.g., Elasticsearch, Mimir). Throughput can swing dramatically depending on span sizes, attribute counts, and cardinality spikes.
Storage efficiency depends on compression design. Time-series databases like Prometheus use delta encoding and variable-length encoding to achieve significant compression, though the actual storage footprint varies based on data patterns and cardinality.
Columnar engines like ClickHouse can achieve compression ratios ranging from 2x to over 30x, depending on data characteristics, with typical ratios of 5-15x for mixed datasets. The optimal choice depends on your query patterns: percentile latency queries align better with TSDBs, while wide correlation queries benefit from columnar scans.
Horizontal scaling describes how well a system takes advantage of additional nodes. Architectures like Mimir and Thanos are designed for near-linear scaling by evenly distributing query and storage load. Elasticsearch, on the other hand, introduces different considerations. Because it relies on shard allocation and background processes like segment merges, adding nodes also brings coordination work — such as moving shards or running compaction tasks. These processes are essential for cluster health, but they can add overhead that affects scaling efficiency.

Published benchmarks give you a rough idea of what’s possible. The only way to know if a backend will hold up is to run it against your own workload — with your span sizes, attribute counts, and cardinality spikes.

Data Model and Semantic Convention Handling

The value of OpenTelemetry is in the structured data model and semantic conventions that give your signals meaning. How a backend treats these conventions determines how much of that meaning you retain once data is stored and queried.

Key aspects to validate include:

Attribute preservation: Custom attributes and resource labels should survive ingestion exactly as emitted. Some systems normalize or truncate keys (http.user_agent becoming user_agent), which strips away context you explicitly added during instrumentation. If you tag spans with customer_id or deployment_region, check that these remain intact and queryable.
Metric type handling: OpenTelemetry defines clear metric types: counter, gauge, histogram, and summary. Backends that flatten histograms into counters reduce latency distributions to averages, erasing the resolution needed for percentile-based SLOs. Similarly, losing support for cumulative vs. delta temporality can affect the correctness of derived metrics.
Trace relationships: Distributed tracing depends on parent–child links and span references to reconstruct request paths. If a backend drops links, weakens relationships, or only stores sampled root spans, your ability to follow a transaction across services and pinpoint a bottleneck is limited. Testing this with a multi-service workload is the only reliable way to confirm support.

Run a familiar workload and verify if the custom fields, histogram buckets, and span links remain intact in queries.

Backend Pricing Models in Practice

A backend’s pricing model shapes how you design telemetry pipelines, what optimizations you prioritize, and where cost bottlenecks appear.

The spread is significant — analysis of log-volume pricing showed costs ranging from about $0.15/GB/mo to $5.08/GB/mo, a 33× difference depending on the vendor and assumptions around retention and storage

Most platforms today follow one of four models:

Signal-Based Pricing

Here, costs scale with the number of telemetry events you send — spans, log lines, or metric samples. If your collectors export 100k spans per second and you double service replicas, the bill rises in proportion. This direct mapping makes cost attribution clear, and it encourages teams to think carefully about pipeline design.

You typically manage spend through collector-level policies:

Sampling: probability or tail-based sampling for traces
Aggregation: pre-computing summaries for metrics with high cardinality, like per-user counters
Filtering: dropping debug logs or redundant spans before they reach storage

Vendors meter signals differently — per span, per time series, or per histogram bucket — but the core idea is that data volume and cost move in step.

Host and Resource-Based Pricing

In this model, billing follows infrastructure size: the number of nodes, VMs, or containers under monitoring. A 10-node Kubernetes cluster costs the same whether it runs 100 pods or generates 10,000 spans per second. Autoscaling fleets, though, can cause variation.

Engineering decisions influence cost differently here:

Deployment strategy: a node-level DaemonSet collector may count as one monitored host, while hundreds of sidecar collectors could raise the count
Bin-packing: scheduling pods more tightly onto fewer nodes reduces visible hosts
Resource efficiency: denser instances often mean fewer counted “hosts”

This approach matches traditional infra accounting, but in dynamic environments, the mapping between data volume and price can be less direct.

User-Based Pricing

Some backends link the cost to the number of users with platform access. Infrastructure and telemetry can grow, but pricing stays flat until more engineers are added.

Typical strategies include:

Role-based access: reserving full accounts for heavy users, assigning viewer roles more broadly
API-first usage: routing queries through APIs or shared dashboards instead of per-user logins
Shared service accounts: reducing seat counts for lightweight usage

This model works well for small teams, though at larger scales it can shape how broadly observability tools are adopted internally.

Data Volume and Retention Pricing

Finally, some platforms bill by how much data you store and how long you keep it. Retention periods often influence cost more than ingestion rate: keeping data for 90 or 180 days multiplies storage requirements several times.

You can manage costs through storage architecture:

Tiered storage: store recent data in fast local storage (e.g., SSD or a TSDB engine) and configure older data to move via storage policies or TTL rules onto cheaper object storage (like S3, GCS). Formats like Parquet (and/or ORC) are commonly used for archival or cold-storage layers.
For example, ClickHouse’s MergeTree engine supports hot/cold volume policies and moving data from SSD to object storage via configured storage policies
Compression: methods like delta encoding, block compression, or Gorilla algorithms reduce the footprint significantly
Retention policies: apply different windows per signal type, such as a week of full-fidelity traces and three months of aggregated metrics

Trade-offs are straightforward: cold tiers are cost-efficient but slower, and compliance rules (for example, PCI-DSS requires one year of audit logs, with three months immediately available online) may dictate minimum retention regardless of cost.

How to Control Costs Under Different Models

The next step is building pipelines that give you cost control without sacrificing observability. The goal is to manage how telemetry flows and where it lands so that cost matches value.

Streaming Aggregation

Raw metrics can overwhelm storage and query layers, especially when they carry high-cardinality labels. Instead of exporting every raw measurement, you can aggregate on the fly at the collector.

Last9, for example, supports streaming aggregation — computing histograms, percentiles, and rolling windows in motion before data is stored. This approach preserves analytical fidelity for dashboards and SLOs while avoiding the blow-up of exporting billions of raw samples. The key is selecting aggregation windows carefully to balance cost savings with the resolution needed for analysis.

Intelligent Sampling

Sampling is one of the most commonly used levers for controlling trace costs. With the tail_sampling processor, you can define rules that capture all traces for production-facing APIs while reducing rates for background jobs or internal tasks.

This keeps incident-critical data intact but cuts down volume where it’s less useful. The only trade-off here is that too much sampling can hide rare anomalies, so tuning is essential.

Multi-Backend Strategies

Not every signal type has the same storage or query requirements. With OpenTelemetry collectors, you can export the same telemetry stream to multiple backends and match each system to the workload: a time-series database for metrics, a tracing backend for distributed traces, and an object store or log platform for high-volume logs.

Many organizations adopt this approach to control costs — keeping logs in cost-efficient storage, for instance, while sending traces and metrics to systems optimized for fast queries. The trade-off is complexity.

Data correlation doesn’t disappear in a multi-backend setup, but it takes more work. As long as your telemetry includes consistent trace/span IDs and resource attributes, you can still link data across systems through a central visualization layer or dashboards. Even then, workflows often take longer, and managing multiple backends can reduce engineering productivity.

Tiered Storage

Telemetry loses value over time. You need the last week of data to be fast and queryable for incident response, but month-old data is usually for trend analysis or compliance. Tiered storage policies handle this automatically:

Hot tier: recent data in fast columnar or time-series databases, optimized for low-latency queries
Cold tier: older data rolled into object storage (S3, GCS, Azure Blob) in formats like Parquet or ORC, still queryable but at lower cost and higher latency

This ensures you keep history without paying premium rates for cold data.

Decision Factors for Backend Selection

When you’re choosing a backend, the decision usually comes down to three things:

How ready is your team to operate it
How your costs will grow over time
How much flexibility do you want to keep for the future?

Start with Your Operational Maturity

Think about where your team is today — the right backend usually follows from that.

Running Kubernetes in production with a seasoned on-call rotation? Self-hosted or hybrid setups give you maximum control over storage, scaling, and cost. You already have the muscle to operate stateful systems, so the extra knobs may be worth it.
Some automation in place, but still building observability skills? A managed platform strikes a balance. You get reliable telemetry pipelines while offloading upgrades, scaling, and patching, leaving you room to grow your team’s expertise at your own pace.
No dedicated DevOps or SRE team yet? A fully hosted solution is often the fastest and least risky way to get traces, metrics, and logs flowing. It keeps the focus on building features while giving you observability without extra overhead.

Next, think about how your costs will evolve.

It’s not just about today’s spend — you’ll want to forecast your telemetry volume and retention over the next 2–3 years. That means factoring in both infrastructure and engineering time.

If you’re handling smaller pipelines, hosted pricing often works out better because you don’t have to run and maintain the backend yourself. Once you get into multi-terabyte volumes every month, self-hosting can start to make sense — but only if you’re ready to manage scaling, upgrades, and on-call for the system.

At the same time, many enterprises stick with hosted at a massive scale because the trade-off is worth it: you get predictable operations and can keep your engineers focused on product work.

Your best option depends on whether you value simplicity or fine-grained control more.

Finally, protect your flexibility and independence. OpenTelemetry makes it possible to switch later, as long as you keep your instrumentation clean and vendor-neutral. That means:

Avoid proprietary SDKs or agents that lock you in
Negotiate export rights in your contracts
Test migration paths before you commit fully

How OpenTelemetry-native platforms protect you from lock-in:

Platforms like Last9 that are built natively for OpenTelemetry don’t transform OTLP into proprietary data formats or require vendor-specific SDKs. Your instrumentation code stays portable, and your semantic conventions remain intact. If you ever need to migrate to a different backend, your data model, queries, and dashboards translate directly because Last9 preserves the OpenTelemetry spec end-to-end. This is fundamentally different from platforms that ingest OTLP but convert it into proprietary schemas — those conversions create hidden dependencies that make migration harder.

Also, the landscape is evolving rapidly. eBPF-based Go auto-instrumentation now captures HTTP, gRPC, and database calls with no code changes. OpenTelemetry also added semantic conventions for generative AI workloads, giving you standardized spans and metrics for tracking model behavior. Both will influence how you think about backend compatibility in the near future.

What This Means for Your Observability Stack

Some teams value portability and choose OTel-native platforms. Others have the operational capacity to self-host for tighter cost control.

What OTel gives you is flexibility. As long as your collectors emit OTLP, you can shift telemetry across backends as your scale, compliance needs, or cost profile evolve.

At Last9, we’ve built an OpenTelemetry-native data platform to handle high-cardinality metrics, large trace volumes, and long-term storage. You don’t have to drop attributes or pre-aggregate data — every label you emit stays queryable, and queries remain fast even under heavy load.

Event-based pricing keeps costs predictable: you pay for the telemetry you send, not for hosts or users. With streaming aggregation and tiered storage, you manage scale without losing the fidelity needed for debugging and analysis.

“What I value most about Last9 is that it truly feels worth the price we are paying. Their pay-for-what-you-use model is transparent, and they even worked with us to personalize the experience and bring costs down to a level that suited our needs.”
— Dhruvi, DevOps Engineer, Jiva

Start for free today and see how Last9 can help you keep full fidelity, scale cost-effectively, and stay flexible as your systems grow.

In the next part, we’ll look at how to debug OTel pipelines — common pitfalls like malformed data, missing spans, dropped metrics, and the tooling that helps you fix them.

How to Choose the Right OpenTelemetry Backend

Contents