A team running ~30 services on AWS ECS pings you on Slack at 7pm:
"No logs coming for any service in the past 30 minutes. Everything is showing up under service.name = aws_ecs instead of the actual service name."If you have ever forwarded ECS telemetry to an OTLP backend, you've probably hit this exact failure mode. It's not a bug in the collector and it's not a backend regression — it's the OpenTelemetry resource detection model meeting the reality of AWS ECS task metadata. When service.name isn't set on the resource, several OTLP backends fall back to cloud.platform (which on ECS is the literal string aws_ecs), and every container in the cluster collapses into a single bucket. ~30 services, one row in your service catalog, one giant useless graph.
This post covers:
- Why ECS containers default to
service.name = aws_ecs - The fix for EC2-hosted ECS (instance tags +
resourcedetection/ec2) - The fix for ECS Fargate (task metadata +
resourcedetection: [env, ecs]) - The two pitfalls that bite even after you set
service.name: - Resource-level vs. log-record-level attributes (filters return nothing)
- Conflicting attribute remap rules that silently overwrite
service.name - A production-tested OTel Collector configuration for both deployment shapes.
All configs are from real production deployments — sanitized — running the OpenTelemetry Collector Contrib distribution v0.128+.
Why aws_ecs shows up in the first place
The OpenTelemetry SDK spec says that if service.name is unset, the SDK should default to unknown_service. In practice, the actual fallback you see depends on what's emitting the signal and what your backend does with a missing service.name.
On ECS specifically, two things conspire:
- The OTel Collector's
resourcedetectionprocessor populatescloud.platformwith the valueaws_ecs(per the OTel semantic conventions for cloud resources —aws_ecsfor ECS,aws_ec2for bare EC2,aws_eksfor EKS). - Some backends — Last9 included — use
cloud.platformas the fallback service name whenservice.nameis missing on a record. The reasoning is that "ECS workload" is more useful as a default thanunknown_service:<random_pid>, but the side effect is exactly what the Slack message above describes.
Either way, the moment your collector forwards a signal without service.name, you lose per-service granularity. Filtering, dashboards, alert routing — all broken until you set service.name explicitly per task.
The OTel SDK can set service.name from the OTEL_SERVICE_NAME or OTEL_RESOURCE_ATTRIBUTES environment variable, but that only covers application-emitted signals. For:
- ECS container metrics scraped via
awsecscontainermetrics - Stdout logs forwarded by Fluent Bit / Firelens
- Host metrics scraped from the EC2 instance
…the OTel SDK is not in the loop. The collector itself has to derive service.name from infrastructure metadata.
Pattern 1: EC2-hosted ECS (or plain EC2 with services tagged on the instance)
In this layout, you run ECS on EC2 launch type. The EC2 instance carries an EC2 tag — typically service_name=order-api for single-service hosts, or a stack/cluster tag for multi-service hosts where individual containers carry their own labels. The OTel Collector runs as a daemonset or systemd unit on the host. This pattern also applies to plain EC2 deployments without ECS, as long as services are identifiable from instance tags.
The fix is two processors:
resourcedetection/ec2— reads EC2 instance metadata + tagstransform/ec2— promotes theec2.tag.service_nameattribute toservice.name(this uses the OTel transform processor and OTTL)
processors:
resourcedetection/ec2:
detectors:
- "ec2"
ec2:
tags:
- ^Name$
- ^app$
- ^service_name$
- ^component_name$
- ^env_name$
- ^environment$
- ^deployment_stack$
- ^cluster$
transform/ec2:
error_mode: ignore
log_statements:
- context: resource
statements:
- set(attributes["service.name"], attributes["ec2.tag.service_name"])
trace_statements:
- context: resource
statements:
- set(attributes["service.name"], attributes["ec2.tag.service_name"])
metric_statements:
- context: resource
statements:
- set(attributes["service.name"], attributes["ec2.tag.service_name"])A few things worth calling out:
The tags: allowlist is regex. resourcedetection/ec2 only pulls tags whose names match these regexes — anything else is dropped. This is a feature, not a bug: pulling every tag from every instance can blow up cardinality. Be explicit.
The detected tags land under ec2.tag.<tag-name>. So if your tag is service_name, the resource attribute becomes ec2.tag.service_name. The transform/ec2 processor rewrites it to the canonical service.name.
You need OTTL statements for each signal type separately. OTTL contexts are signal-scoped: log_statements, trace_statements, and metric_statements are independent. Forgetting one signal is the most common reason "logs work but metrics still show aws_ecs."
For host-level logs (e.g., /var/log/messages, /var/log/syslog, app log files on disk), pair this with the OTel Filelog receiver — same OTTL transform applies.
Wire it into the pipelines:
service:
pipelines:
logs:
receivers: [filelog, otlp]
processors:
- resourcedetection/ec2
- transform/ec2
- batch
exporters: [otlp/last9]
traces:
receivers: [otlp]
processors:
- resourcedetection/ec2
- transform/ec2
- batch
exporters: [otlp/last9]
metrics:
receivers: [otlp, hostmetrics]
processors:
- resourcedetection/ec2
- transform/ec2
- batch
exporters: [otlp/last9]Ordering matters: resourcedetection must come before transform. The transform reads attributes that resourcedetection writes.
Pattern 2: ECS Fargate (no EC2 instance, only task metadata)
Fargate has no EC2 instance you can tag. Instead, the task definition itself carries:
- The task family name (
aws.ecs.task.family) - The task ARN (
aws.ecs.task.arn) - The container name (
container.name) - Anything you bake into the task as environment variables (most importantly
OTEL_RESOURCE_ATTRIBUTES)
The collector picks all of this up via resourcedetection: [env, ecs]:
envdetector readsOTEL_RESOURCE_ATTRIBUTESfrom the container's environmentecsdetector hits the ECS task metadata endpoint v4 and pulls task/container info
For a sidecar collector running in the same Fargate task as the application:
receivers:
# ECS task metrics
awsecscontainermetrics:
collection_interval: 60s
# Logs from Fluent Bit / Firelens running in the same task
fluentforward:
endpoint: 0.0.0.0:8006
# OTLP from the application container
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
resourcedetection:
detectors: [env, ecs]
# Fall back to container_name for logs that arrive without a service.name resource attribute
# (i.e. Firelens/Fluent Bit forwarding from sibling containers in the same task)
transform/firelens:
error_mode: ignore
log_statements:
- context: log
statements:
- set(resource.attributes["service.name"], attributes["container_name"])
where attributes["container_name"] != nil
and resource.attributes["service.name"] == nil
batch:
send_batch_max_size: 1000
send_batch_size: 1000
timeout: 10s
service:
pipelines:
metrics:
receivers: [awsecscontainermetrics, otlp]
processors: [resourcedetection, batch]
exporters: [otlp/last9]
logs:
receivers: [fluentforward, otlp]
processors: [resourcedetection, transform/firelens, batch]
exporters: [otlp/last9]
traces:
receivers: [otlp]
processors: [resourcedetection, batch]
exporters: [otlp/last9]In the task definition, set:
{
"environment": [
{
"name": "OTEL_RESOURCE_ATTRIBUTES",
"value": "service.name=order-api,deployment.environment=prod"
}
]
}The env detector reads this and the resulting resource carries service.name=order-api. Combined with the ecs detector's aws.ecs.task.family, you get a clean per-service dimension plus task-level diagnostics.
The transform/firelens block is the safety net for logs that flow through Fluent Bit instead of OTLP. Firelens forwards logs as fluentforward records where the originating container's name lands on the log attributes (container_name), not the resource. The OTTL statement promotes it to resource.attributes["service.name"] only when the env detector hasn't already set one — useful when a single sidecar collector is fronting multiple containers in the same task and the task-level env var doesn't disambiguate them.
Pitfall 1: resource-level vs. log-record-level attributes
You set service.name correctly on the resource. Logs flow into the backend. You filter on service.name = order-api in the logs UI — and get nothing.
This is the single most common follow-up issue after the initial fix. The reason:
Resource attributes describe the emitter. Log record attributes describe the event. Some log query engines only index log-record attributes, so a service.name set only on the resource is invisible to a record-level filter.Backends differ here. Some flatten resource attributes onto every record at ingest time, some keep them strictly separate, and some make the distinction queryable but require different syntax for each scope. Last9 belongs to the second group — resource attributes are stored alongside but not merged into the log record by default. If your backend does the same, you need to copy the attributes you care about onto every log record.
This is the same class of problem Prometheus 3.0 fixed for OTel metrics — by promoting resource attributes like service.name and deployment.environment to first-class metric labels rather than burying them in target_info. For logs, the equivalent has to happen in the collector.
The fix is an OTTL transform that copies key resource attributes onto every log record:
processors:
transform/promote_to_logrecord:
error_mode: ignore
log_statements:
- context: log
statements:
- set(attributes["service.name"], resource.attributes["service.name"])
where resource.attributes["service.name"] != nil
- set(attributes["deployment.environment"], resource.attributes["deployment.environment"])
where resource.attributes["deployment.environment"] != nil
- set(attributes["host.name"], resource.attributes["host.name"])
where resource.attributes["host.name"] != nilAdd this after resourcedetection and transform/ec2 (or transform/firelens) in the logs pipeline. Now service.name is queryable on the log record itself.
A general rule: anything you want to filter, group, or facet on in a log query needs to live on the log record. The resource scope is for grouping at the source level, not for ad-hoc querying.
Pitfall 2: server-side remap rules that overwrite service.name
Many backends expose remapping rules — server-side processing that renames or copies attributes after ingestion. They're convenient: you don't have to redeploy your collector to add a label or backfill a missing one.
But applied to service.name, they're a footgun. The classic failure mode:
A remap rule is configured to do something like "if service.name is empty on a record, copy cloud.platform into it so we at least have something to group by." The intent is benign — give signals that slipped through without resource detection some kind of identifier.
The trap is in the "is empty" check. Resource attributes and log-record attributes live in different scopes. A service.name correctly set on the resource by transform/ec2 may still look "empty" to a remap rule evaluating the log-record scope. The rule fires, copies cloud.platform (which is the constant string aws_ecs for every container on ECS), and now every record looks like one giant service. Same outcome as the unconfigured collector — but harder to debug, because the collector pipeline looks correct.
Two takeaways:
- Avoid
cloud.platformas a fallback forservice.nameat any layer. It's the same value for every container in your fleet — it provides no signal and actively hides correctly-labelled traffic when something accidentally overwrites the real value. - Set
service.nameonce, at the closest layer to the source. If the SDK sets it, don't re-set it in the collector. If the collector sets it from EC2 tags, don't re-set it server-side. Multi-layerservice.namerules are fragile because the precedence is rarely what you expect.
When in doubt, drop server-side remap rules for service.name entirely. Make the collector authoritative.
Verifying the fix
After deploying, three sanity checks:
1. Inspect the resource attributes the collector exports. Add a debug exporter with verbosity: detailed to a non-prod collector and look at one record:
exporters:
debug:
verbosity: detailed
service:
pipelines:
logs:
exporters: [debug, otlp/last9]You should see something like:
Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1
Resource attributes:
-> service.name: Str(order-api)
-> deployment.environment: Str(prod)
-> cloud.platform: Str(aws_ecs)
-> ec2.tag.service_name: Str(order-api)
-> aws.ecs.task.family: Str(order-api-task)If service.name is still aws_ecs here, the issue is in the collector pipeline, not the backend.
2. Check log records carry the promoted attributes.
LogRecord #0
Body: {...}
Attributes:
-> service.name: Str(order-api)
-> deployment.environment: Str(prod)If logs only show service.name on the resource and not the record, the transform/promote_to_logrecord step is missing or running before resourcedetection.
3. Filter in the backend by service.name = <one of your services>. You should see only that service's logs, not a mix of everything labelled aws_ecs.
When OTEL_RESOURCE_ATTRIBUTES alone is enough
If you run only OTLP-instrumented apps on Fargate and don't ingest container metrics or stdout logs through the collector, you can skip most of this and just set OTEL_RESOURCE_ATTRIBUTES on the task. The OTel SDK will pick it up at startup and stamp every signal with the right resource. This is also the recommended path for Lambda functions instrumented with OpenTelemetry — the SDK is in process, env vars are the source of truth.
Where this breaks down:
- ECS task metrics (
awsecscontainermetricsreceiver) — collector emits these, no SDK in the loop - Stdout logs forwarded by Firelens / Fluent Bit — no SDK in the loop
- Sidecar collectors that wrap multiple containers — task-level env vars don't disambiguate per-container service names
For anything beyond a single OTLP-only app, you need the collector-level resource detection + transform pattern above.
Summary
| Symptom | Root cause | Fix |
|---|---|---|
All logs show service.name = aws_ecs |
No service.name resource attribute set; backend falls back to cloud.platform |
resourcedetection/ec2 + transform/ec2 (EC2 ECS) or resourcedetection: [env, ecs] + OTEL_RESOURCE_ATTRIBUTES on the task (Fargate) |
service.name is set but filters return nothing |
Attribute is on resource, query engine looks at log record | OTTL transform to copy resource attributes onto each log record |
service.name was working, suddenly all aws_ecs again |
Server-side remap rule overwrote it | Avoid cloud.platform-derived fallbacks; make the collector authoritative for service.name |
The OTel Collector gives you the building blocks, but you have to assemble them in the right order: detect → transform → promote → export. Each layer matters, and skipping one of them is what gives you the "all my ECS logs are one service" Slack message at 7pm on a Friday.
Send your ECS telemetry to Last9
Last9 ingests OTLP for logs, metrics, and traces with no proprietary agent. Once your collector is correctly setting service.name, you get a per-service catalog, RED metrics, and full log-trace correlation out of the box — no resource-vs-record promotion needed for service.name, since Last9 honours it as a first-class identifier across all three signals.
Start sending ECS telemetry to Last9 →
References
- The OpenTelemetry Collector Deep Dive — Last9
- Everything You Should Know About OpenTelemetry Collector Contrib — Last9
- OpenTelemetry Processors: Workflows, Configuration Tips, and Best Practices — Last9
- Sidecar or Agent for OpenTelemetry: How to Decide — Last9
- What is AWS Fargate for Amazon ECS? — Last9
- An Easy Guide to OpenTelemetry Environment Variables — Last9
- How Prometheus 3.0 Fixes Resource Attributes for OTel Metrics — Last9
- OpenTelemetry Filelog Receiver: Collecting Kubernetes Logs — Last9
- Instrumenting AWS Lambda Functions with OpenTelemetry — Last9
- OpenTelemetry Collector Contrib
resourcedetectionprocessor - OpenTelemetry Collector Contrib
awsecscontainermetricsreceiver - OTTL Transform processor — context reference
- ECS Task Metadata Endpoint v4
- OpenTelemetry semantic conventions for cloud resources
