Stop ECS Containers From Collapsing Into One Service in OpenTelemetry

A team running ~30 services on AWS ECS pings you on Slack at 7pm:

"No logs coming for any service in the past 30 minutes. Everything is showing up under service.name = aws_ecs instead of the actual service name."

If you have ever forwarded ECS telemetry to an OTLP backend, you've probably hit this exact failure mode. It's not a bug in the collector and it's not a backend regression — it's the OpenTelemetry resource detection model meeting the reality of AWS ECS task metadata. When service.name isn't set on the resource, several OTLP backends fall back to cloud.platform (which on ECS is the literal string aws_ecs), and every container in the cluster collapses into a single bucket. ~30 services, one row in your service catalog, one giant useless graph.

This post covers:

Why ECS containers default to service.name = aws_ecs
The fix for EC2-hosted ECS (instance tags + resourcedetection/ec2)
The fix for ECS Fargate (task metadata + resourcedetection: [env, ecs])
The two pitfalls that bite even after you set service.name:
Resource-level vs. log-record-level attributes (filters return nothing)
Conflicting attribute remap rules that silently overwrite service.name
A production-tested OTel Collector configuration for both deployment shapes.

All configs are from real production deployments — sanitized — running the OpenTelemetry Collector Contrib distribution v0.128+.

Why `aws_ecs` shows up in the first place

The OpenTelemetry SDK spec says that if service.name is unset, the SDK should default to unknown_service. In practice, the actual fallback you see depends on what's emitting the signal and what your backend does with a missing service.name.

On ECS specifically, two things conspire:

The OTel Collector's resourcedetection processor populates cloud.platform with the value aws_ecs (per the OTel semantic conventions for cloud resources — aws_ecs for ECS, aws_ec2 for bare EC2, aws_eks for EKS).
Some backends — Last9 included — use cloud.platform as the fallback service name when service.name is missing on a record. The reasoning is that "ECS workload" is more useful as a default than unknown_service:<random_pid>, but the side effect is exactly what the Slack message above describes.

Either way, the moment your collector forwards a signal without service.name, you lose per-service granularity. Filtering, dashboards, alert routing — all broken until you set service.name explicitly per task.

The OTel SDK can set service.name from the OTEL_SERVICE_NAME or OTEL_RESOURCE_ATTRIBUTES environment variable, but that only covers application-emitted signals. For:

ECS container metrics scraped via awsecscontainermetrics
Stdout logs forwarded by Fluent Bit / Firelens
Host metrics scraped from the EC2 instance

…the OTel SDK is not in the loop. The collector itself has to derive service.name from infrastructure metadata.

Pattern 1: EC2-hosted ECS (or plain EC2 with services tagged on the instance)

In this layout, you run ECS on EC2 launch type. The EC2 instance carries an EC2 tag — typically service_name=order-api for single-service hosts, or a stack/cluster tag for multi-service hosts where individual containers carry their own labels. The OTel Collector runs as a daemonset or systemd unit on the host. This pattern also applies to plain EC2 deployments without ECS, as long as services are identifiable from instance tags.

The fix is two processors:

resourcedetection/ec2 — reads EC2 instance metadata + tags
transform/ec2 — promotes the ec2.tag.service_name attribute to service.name (this uses the OTel transform processor and OTTL)

processors:
  resourcedetection/ec2:
    detectors:
      - "ec2"
    ec2:
      tags:
        - ^Name$
        - ^app$
        - ^service_name$
        - ^component_name$
        - ^env_name$
        - ^environment$
        - ^deployment_stack$
        - ^cluster$

  transform/ec2:
    error_mode: ignore
    log_statements:
      - context: resource
        statements:
          - set(attributes["service.name"], attributes["ec2.tag.service_name"])
    trace_statements:
      - context: resource
        statements:
          - set(attributes["service.name"], attributes["ec2.tag.service_name"])
    metric_statements:
      - context: resource
        statements:
          - set(attributes["service.name"], attributes["ec2.tag.service_name"])

A few things worth calling out:

The tags: allowlist is regex. resourcedetection/ec2 only pulls tags whose names match these regexes — anything else is dropped. This is a feature, not a bug: pulling every tag from every instance can blow up cardinality. Be explicit.

The detected tags land under ec2.tag.<tag-name>. So if your tag is service_name, the resource attribute becomes ec2.tag.service_name. The transform/ec2 processor rewrites it to the canonical service.name.

You need OTTL statements for each signal type separately. OTTL contexts are signal-scoped: log_statements, trace_statements, and metric_statements are independent. Forgetting one signal is the most common reason "logs work but metrics still show aws_ecs."

For host-level logs (e.g., /var/log/messages, /var/log/syslog, app log files on disk), pair this with the OTel Filelog receiver — same OTTL transform applies.

Wire it into the pipelines:

service:
  pipelines:
    logs:
      receivers: [filelog, otlp]
      processors:
        - resourcedetection/ec2
        - transform/ec2
        - batch
      exporters: [otlp/last9]
    traces:
      receivers: [otlp]
      processors:
        - resourcedetection/ec2
        - transform/ec2
        - batch
      exporters: [otlp/last9]
    metrics:
      receivers: [otlp, hostmetrics]
      processors:
        - resourcedetection/ec2
        - transform/ec2
        - batch
      exporters: [otlp/last9]

Ordering matters: resourcedetection must come before transform. The transform reads attributes that resourcedetection writes.

Pattern 2: ECS Fargate (no EC2 instance, only task metadata)

Fargate has no EC2 instance you can tag. Instead, the task definition itself carries:

The task family name (aws.ecs.task.family)
The task ARN (aws.ecs.task.arn)
The container name (container.name)
Anything you bake into the task as environment variables (most importantly OTEL_RESOURCE_ATTRIBUTES)

The collector picks all of this up via resourcedetection: [env, ecs]:

env detector reads OTEL_RESOURCE_ATTRIBUTES from the container's environment
ecs detector hits the ECS task metadata endpoint v4 and pulls task/container info

For a sidecar collector running in the same Fargate task as the application:

receivers:
  # ECS task metrics
  awsecscontainermetrics:
    collection_interval: 60s

  # Logs from Fluent Bit / Firelens running in the same task
  fluentforward:
    endpoint: 0.0.0.0:8006

  # OTLP from the application container
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  resourcedetection:
    detectors: [env, ecs]

  # Fall back to container_name for logs that arrive without a service.name resource attribute
  # (i.e. Firelens/Fluent Bit forwarding from sibling containers in the same task)
  transform/firelens:
    error_mode: ignore
    log_statements:
      - context: log
        statements:
          - set(resource.attributes["service.name"], attributes["container_name"])
              where attributes["container_name"] != nil
              and resource.attributes["service.name"] == nil

  batch:
    send_batch_max_size: 1000
    send_batch_size: 1000
    timeout: 10s

service:
  pipelines:
    metrics:
      receivers: [awsecscontainermetrics, otlp]
      processors: [resourcedetection, batch]
      exporters: [otlp/last9]
    logs:
      receivers: [fluentforward, otlp]
      processors: [resourcedetection, transform/firelens, batch]
      exporters: [otlp/last9]
    traces:
      receivers: [otlp]
      processors: [resourcedetection, batch]
      exporters: [otlp/last9]

In the task definition, set:

{
  "environment": [
    {
      "name": "OTEL_RESOURCE_ATTRIBUTES",
      "value": "service.name=order-api,deployment.environment=prod"
    }
  ]
}

The env detector reads this and the resulting resource carries service.name=order-api. Combined with the ecs detector's aws.ecs.task.family, you get a clean per-service dimension plus task-level diagnostics.

The transform/firelens block is the safety net for logs that flow through Fluent Bit instead of OTLP. Firelens forwards logs as fluentforward records where the originating container's name lands on the log attributes (container_name), not the resource. The OTTL statement promotes it to resource.attributes["service.name"] only when the env detector hasn't already set one — useful when a single sidecar collector is fronting multiple containers in the same task and the task-level env var doesn't disambiguate them.

Pitfall 1: resource-level vs. log-record-level attributes

You set service.name correctly on the resource. Logs flow into the backend. You filter on service.name = order-api in the logs UI — and get nothing.

This is the single most common follow-up issue after the initial fix. The reason:

Resource attributes describe the emitter. Log record attributes describe the event. Some log query engines only index log-record attributes, so a service.name set only on the resource is invisible to a record-level filter.

Backends differ here. Some flatten resource attributes onto every record at ingest time, some keep them strictly separate, and some make the distinction queryable but require different syntax for each scope. Last9 belongs to the second group — resource attributes are stored alongside but not merged into the log record by default. If your backend does the same, you need to copy the attributes you care about onto every log record.

This is the same class of problem Prometheus 3.0 fixed for OTel metrics — by promoting resource attributes like service.name and deployment.environment to first-class metric labels rather than burying them in target_info. For logs, the equivalent has to happen in the collector.

The fix is an OTTL transform that copies key resource attributes onto every log record:

processors:
  transform/promote_to_logrecord:
    error_mode: ignore
    log_statements:
      - context: log
        statements:
          - set(attributes["service.name"], resource.attributes["service.name"])
              where resource.attributes["service.name"] != nil
          - set(attributes["deployment.environment"], resource.attributes["deployment.environment"])
              where resource.attributes["deployment.environment"] != nil
          - set(attributes["host.name"], resource.attributes["host.name"])
              where resource.attributes["host.name"] != nil

Add this after resourcedetection and transform/ec2 (or transform/firelens) in the logs pipeline. Now service.name is queryable on the log record itself.

A general rule: anything you want to filter, group, or facet on in a log query needs to live on the log record. The resource scope is for grouping at the source level, not for ad-hoc querying.

Pitfall 2: server-side remap rules that overwrite `service.name`

Many backends expose remapping rules — server-side processing that renames or copies attributes after ingestion. They're convenient: you don't have to redeploy your collector to add a label or backfill a missing one.

But applied to service.name, they're a footgun. The classic failure mode:

A remap rule is configured to do something like "if service.name is empty on a record, copy cloud.platform into it so we at least have something to group by." The intent is benign — give signals that slipped through without resource detection some kind of identifier.

The trap is in the "is empty" check. Resource attributes and log-record attributes live in different scopes. A service.name correctly set on the resource by transform/ec2 may still look "empty" to a remap rule evaluating the log-record scope. The rule fires, copies cloud.platform (which is the constant string aws_ecs for every container on ECS), and now every record looks like one giant service. Same outcome as the unconfigured collector — but harder to debug, because the collector pipeline looks correct.

Two takeaways:

Avoid cloud.platform as a fallback for service.name at any layer. It's the same value for every container in your fleet — it provides no signal and actively hides correctly-labelled traffic when something accidentally overwrites the real value.
Set service.name once, at the closest layer to the source. If the SDK sets it, don't re-set it in the collector. If the collector sets it from EC2 tags, don't re-set it server-side. Multi-layer service.name rules are fragile because the precedence is rarely what you expect.

When in doubt, drop server-side remap rules for service.name entirely. Make the collector authoritative.

Verifying the fix

After deploying, three sanity checks:

1. Inspect the resource attributes the collector exports. Add a debug exporter with verbosity: detailed to a non-prod collector and look at one record:

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    logs:
      exporters: [debug, otlp/last9]

You should see something like:

Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1
Resource attributes:
     -> service.name: Str(order-api)
     -> deployment.environment: Str(prod)
     -> cloud.platform: Str(aws_ecs)
     -> ec2.tag.service_name: Str(order-api)
     -> aws.ecs.task.family: Str(order-api-task)

If service.name is still aws_ecs here, the issue is in the collector pipeline, not the backend.

2. Check log records carry the promoted attributes.

LogRecord #0
Body: {...}
Attributes:
     -> service.name: Str(order-api)
     -> deployment.environment: Str(prod)

If logs only show service.name on the resource and not the record, the transform/promote_to_logrecord step is missing or running before resourcedetection.

3. Filter in the backend by service.name = <one of your services>. You should see only that service's logs, not a mix of everything labelled aws_ecs.

When `OTEL_RESOURCE_ATTRIBUTES` alone is enough

If you run only OTLP-instrumented apps on Fargate and don't ingest container metrics or stdout logs through the collector, you can skip most of this and just set OTEL_RESOURCE_ATTRIBUTES on the task. The OTel SDK will pick it up at startup and stamp every signal with the right resource. This is also the recommended path for Lambda functions instrumented with OpenTelemetry — the SDK is in process, env vars are the source of truth.

Where this breaks down:

ECS task metrics (awsecscontainermetrics receiver) — collector emits these, no SDK in the loop
Stdout logs forwarded by Firelens / Fluent Bit — no SDK in the loop
Sidecar collectors that wrap multiple containers — task-level env vars don't disambiguate per-container service names

For anything beyond a single OTLP-only app, you need the collector-level resource detection + transform pattern above.

Summary

Symptom	Root cause	Fix
All logs show `service.name = aws_ecs`	No `service.name` resource attribute set; backend falls back to `cloud.platform`	`resourcedetection/ec2` + `transform/ec2` (EC2 ECS) or `resourcedetection: [env, ecs]` + `OTEL_RESOURCE_ATTRIBUTES` on the task (Fargate)
`service.name` is set but filters return nothing	Attribute is on resource, query engine looks at log record	OTTL `transform` to copy resource attributes onto each log record
`service.name` was working, suddenly all `aws_ecs` again	Server-side remap rule overwrote it	Avoid `cloud.platform`-derived fallbacks; make the collector authoritative for `service.name`

The OTel Collector gives you the building blocks, but you have to assemble them in the right order: detect → transform → promote → export. Each layer matters, and skipping one of them is what gives you the "all my ECS logs are one service" Slack message at 7pm on a Friday.

Send your ECS telemetry to Last9

Last9 ingests OTLP for logs, metrics, and traces with no proprietary agent. Once your collector is correctly setting service.name, you get a per-service catalog, RED metrics, and full log-trace correlation out of the box — no resource-vs-record promotion needed for service.name, since Last9 honours it as a first-class identifier across all three signals.

Start sending ECS telemetry to Last9 →

Stop ECS Containers From Collapsing Into One Service in OpenTelemetry

Contents

Why `aws_ecs` shows up in the first place

Pattern 1: EC2-hosted ECS (or plain EC2 with services tagged on the instance)

Pattern 2: ECS Fargate (no EC2 instance, only task metadata)

Pitfall 1: resource-level vs. log-record-level attributes

Pitfall 2: server-side remap rules that overwrite `service.name`

Verifying the fix

When `OTEL_RESOURCE_ATTRIBUTES` alone is enough

Summary

Send your ECS telemetry to Last9

References

Contents

Start observing for free. No lock-in.

Stop ECS Containers From Collapsing Into One Service in OpenTelemetry

Contents

Why aws_ecs shows up in the first place

Pattern 1: EC2-hosted ECS (or plain EC2 with services tagged on the instance)

Pattern 2: ECS Fargate (no EC2 instance, only task metadata)

Pitfall 1: resource-level vs. log-record-level attributes

Pitfall 2: server-side remap rules that overwrite service.name

Verifying the fix

When OTEL_RESOURCE_ATTRIBUTES alone is enough

Summary

Send your ECS telemetry to Last9

References

Contents

Start observing for free. No lock-in.

Why `aws_ecs` shows up in the first place

Pitfall 2: server-side remap rules that overwrite `service.name`

When `OTEL_RESOURCE_ATTRIBUTES` alone is enough