Skip to content
Last9
Book demo

Processors and Transforms

Reduce telemetry volume and shape data before it reaches Last9 using the filter, transform, batch, and memory_limiter processors in the OpenTelemetry Collector.

Processors run on telemetry data between the receiver and the exporter. They are the primary tool for reducing volume, dropping noise, and enriching data before it reaches Last9.

This guide covers the four processors most commonly used in production deployments:

ProcessorWhat it does
filterDrop spans, logs, or metrics that match a condition
transformRename, enrich, or modify telemetry using OTTL expressions
memory_limiterPrevent the collector from OOMing under traffic spikes
batchBuffer and flush data in efficient batches

All processors must be listed in the service.pipelines section to take effect.


filter processor

The filter processor drops telemetry that matches one or more OTTL conditions. Use it to eliminate spans, logs, or metrics that add volume without adding value.

Drop internal spans

Internal spans from frameworks like GraphQL routers, ORM layers, and service meshes can account for the majority of your trace volume. Drop them to reduce ingestion by up to 60–70%:

processors:
filter/drop_internal_spans:
error_mode: ignore
traces:
span:
- 'kind == SPAN_KIND_INTERNAL and status.code != STATUS_CODE_ERROR'

Drop internal spans for specific services

When you need finer control — for example, dropping internal spans only from known high-volume services:

processors:
filter/drop_internal_spans:
error_mode: ignore
traces:
span:
- 'resource.attributes["service.name"] == "gql-router" and kind == SPAN_KIND_INTERNAL'
- 'resource.attributes["service.name"] == "kong" and kind == SPAN_KIND_INTERNAL'
- 'resource.attributes["service.name"] == "api-gateway" and kind == SPAN_KIND_INTERNAL and status.code != STATUS_CODE_ERROR'

Drop database noise spans

Transaction bookkeeping spans (BEGIN, COMMIT, ROLLBACK) inflate trace counts without providing actionable information:

processors:
filter/drop_db_noise:
error_mode: ignore
traces:
span:
- 'attributes["db.system"] != "" and IsMatch(name, "^(BEGIN|COMMIT|ROLLBACK)$")'

Drop logs by severity

Drop DEBUG and TRACE logs to reduce log volume in production. Logs at INFO and above pass through:

processors:
filter/drop_debug_logs:
error_mode: ignore
logs:
log_record:
- 'severity_number < SEVERITY_NUMBER_INFO'

Drop redundant metric buckets

For Prometheus histograms, you often only need one of _count, _sum, or _bucket depending on your use case. Drop the ones you don’t query:

processors:
filter/drop_histogram_sum:
error_mode: ignore
metrics:
datapoint:
- 'IsMatch(metric.name, ".*_sum$")'

transform processor

The transform processor modifies telemetry in place using OTTL statements. Use it to rename spans, normalize operation names, add missing attributes, or fix instrumentation gaps.

Remove unique IDs from span names

Auto-instrumented frameworks sometimes embed request-specific values (UUIDs, user IDs, numeric IDs) in span names, creating unbounded cardinality in APM:

processors:
transform/normalize_span_names:
error_mode: ignore
trace_statements:
- context: span
statements:
# Remove query params: "GET /users?id=abc123" → "GET /users"
- replace_pattern(name, "\\?.*$", "")
# Remove numeric path segments: "/api/users/12345/orders" → "/api/users/{id}/orders"
- replace_pattern(name, "/[0-9]+", "/{id}")
# Remove UUIDs: "/sessions/550e8400-e29b-41d4-a716-446655440000" → "/sessions/{uuid}"
- replace_pattern(name, "/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}", "/{uuid}")

Fix GraphQL span visibility in APM

Apollo Server and other GraphQL frameworks emit the HTTP transport span and the named operation span both as SERVER kind. This inflates throughput 2× in APM and breaks the Operations tab. The fix demotes the HTTP layer to INTERNAL so the named operation is the single source of truth:

processors:
transform/fix_graphql_spans:
error_mode: ignore
trace_statements:
# Step 1: Give the HTTP span a meaningful name before demotion
- context: span
conditions:
- instrumentation_scope.name == "@opentelemetry/instrumentation-http"
and kind == SPAN_KIND_SERVER
statements:
- set(name, Concat([attributes["http.method"], " ", attributes["http.target"]], ""))
where IsString(attributes["http.target"])
# Step 2: Demote HTTP SERVER → INTERNAL when a named GraphQL operation span exists
- context: span
conditions:
- instrumentation_scope.name == "@opentelemetry/instrumentation-http"
and kind == SPAN_KIND_SERVER
and IsMatch(name, ".*/graphql.*")
statements:
- set(kind, SPAN_KIND_INTERNAL)
# Step 3: Add http.method to GraphQL operation spans so APM Operations tab shows them
- context: span
conditions:
- IsString(attributes["graphql.operation.type"])
and kind == SPAN_KIND_SERVER
and attributes["http.method"] == nil
statements:
- set(attributes["http.method"], "POST")
- set(attributes["http.status_code"], "500")
where attributes["http.status_code"] == nil and status.code == STATUS_CODE_ERROR
- set(attributes["http.status_code"], "200")
where attributes["http.status_code"] == nil

Add static labels to CloudWatch metrics

CloudWatch metrics arrive without service_name or environment labels, making service-level alerting impossible. Use a transform processor to add them:

processors:
transform/enrich_cloudwatch:
error_mode: ignore
metric_statements:
- context: datapoint
statements:
- set(attributes["service_name"], "payments-service")
where resource.attributes["aws.cloudwatch.namespace"] == "AWS/ApplicationELB"
- set(attributes["deployment_environment"], "production")

Propagate resource attributes to span attributes

Some backends and dashboards expect certain attributes on the span rather than the resource. Copy them across:

processors:
transform/promote_resource_attrs:
error_mode: ignore
trace_statements:
- context: span
statements:
- set(attributes["k8s.namespace"], resource.attributes["k8s.namespace.name"])
where attributes["k8s.namespace"] == nil
- set(attributes["host"], resource.attributes["host.name"])
where attributes["host"] == nil

memory_limiter processor

The memory_limiter processor protects the collector from OOMing under sudden traffic spikes. It checks memory usage on an interval and begins dropping data when usage crosses a threshold.

Always place memory_limiter first in every pipeline.

processors:
memory_limiter:
check_interval: 1s
limit_percentage: 80
spike_limit_percentage: 25
FieldValueMeaning
check_interval1sHow often to check memory usage
limit_percentage80Start refusing new data at 80% of available memory
spike_limit_percentage25Soft limit — begin backpressure 25% below limit_percentage

Setting the memory limit for the collector process

Set an explicit memory limit for the collector via the --mem-ballast-size-mib flag or GOMEMLIMIT env var. If the collector runs in Kubernetes, set a memory limit on the pod and configure memory_limiter to 80% of that limit:

# Kubernetes pod spec
resources:
limits:
memory: 4Gi
requests:
memory: 2Gi
# Corresponding memory_limiter config
processors:
memory_limiter:
check_interval: 1s
limit_percentage: 80 # 80% of 4Gi = ~3.2Gi
spike_limit_percentage: 25

batch processor

The batch processor buffers spans, logs, and metrics before sending them to the exporter. Batching reduces the number of outbound connections and improves compression ratios.

processors:
batch:
timeout: 5s
send_batch_size: 1000
send_batch_max_size: 2000
FieldValueMeaning
timeout5sSend the current batch after this interval even if send_batch_size is not reached
send_batch_size1000Target number of items per batch
send_batch_max_size2000Maximum batch size; 0 means no limit

Putting it all together

A complete pipeline configuration using all four processors. Order matters: memory_limiter first, then filtering, then transforms, then batching.

processors:
memory_limiter:
check_interval: 1s
limit_percentage: 80
spike_limit_percentage: 25
filter/drop_internal_spans:
error_mode: ignore
traces:
span:
- 'kind == SPAN_KIND_INTERNAL and status.code != STATUS_CODE_ERROR'
filter/drop_debug_logs:
error_mode: ignore
logs:
log_record:
- 'severity_number < SEVERITY_NUMBER_INFO'
transform/normalize_span_names:
error_mode: ignore
trace_statements:
- context: span
statements:
- replace_pattern(name, "\\?.*$", "")
- replace_pattern(name, "/[0-9]+", "/{id}")
batch:
timeout: 5s
send_batch_size: 1000
send_batch_max_size: 2000
service:
pipelines:
traces:
receivers: [otlp]
processors:
- memory_limiter
- filter/drop_internal_spans
- transform/normalize_span_names
- batch
exporters: [otlp/last9]
logs:
receivers: [otlp]
processors:
- memory_limiter
- filter/drop_debug_logs
- batch
exporters: [otlp/last9]
metrics:
receivers: [otlp, prometheus]
processors:
- memory_limiter
- batch
exporters: [otlp/last9]

Troubleshooting

  • Spans still appearing after filter

    Check that the processor name exactly matches what’s listed in service.pipelines. A processor defined but not listed in processors: in the pipeline is silently ignored.

  • error_mode: ignore vs error_mode: propagate

    ignore skips items that cause evaluation errors (e.g. missing attributes) and continues processing. Use propagate during development to surface OTTL syntax errors — switch to ignore in production.

  • High CPU from transform processor

    OTTL expressions with regex (IsMatch, replace_pattern) are evaluated per span. For very high-throughput services, move static attribute assignments to resource_detection or resourcedetection processors, which run once per batch rather than per item.

Please get in touch with us on Discord or Email if you have any questions.