Skip to content
Last9
Book demo

AWS Chalice

Instrument AWS Chalice Lambda functions with OpenTelemetry using ADOT layers for automatic tracing and observability

Use OpenTelemetry to instrument your AWS Chalice Lambda functions and send telemetry data to Last9. Chalice is AWS’s Python serverless framework that manages Lambda deployment, API Gateway, and IAM policies through a single config file. This integration uses the AWS Distro for OpenTelemetry (ADOT) Layer for automatic instrumentation with no code changes required.

Prerequisites

Before setting up AWS Chalice monitoring, ensure you have:

  • AWS Account: With access to Lambda service
  • Python 3.8+: With Chalice installed (pip install chalice)
  • Chalice Project: An existing or new Chalice application
  • Last9 Account: With OpenTelemetry integration credentials

Supported Runtimes

Chalice deploys Python Lambda functions. ADOT Python layers support:

  • Python: 3.8, 3.9, 3.10, 3.11, 3.12
  1. Configure .chalice/config.json

    Add the ADOT Lambda layer and environment variables to your Chalice configuration. The layer provides auto-instrumentation — no application code changes needed.

    {
    "version": "2.0",
    "app_name": "your-app-name",
    "stages": {
    "dev": {
    "api_gateway_stage": "dev",
    "lambda_timeout": 30,
    "lambda_memory_size": 256,
    "xray": true,
    "layers": [
    "arn:aws:lambda:ap-southeast-1:901920570463:layer:aws-otel-python-amd64-ver-1-25-0:1"
    ],
    "environment_variables": {
    "AWS_LAMBDA_EXEC_WRAPPER": "/opt/otel-instrument",
    "OPENTELEMETRY_COLLECTOR_CONFIG_FILE": "/var/task/.chalice/collector-config.yaml",
    "OTEL_SERVICE_NAME": "your-service-name",
    "OTEL_PROPAGATORS": "tracecontext,xray",
    "OTEL_EXPORTER_OTLP_PROTOCOL": "http/protobuf",
    "OTEL_TRACES_EXPORTER": "otlp",
    "OTEL_TRACES_SAMPLER": "always_on",
    "OTEL_RESOURCE_ATTRIBUTES": "deployment.environment=dev"
    }
    }
    }
    }

    Important Configuration Notes:

    • Replace your-app-name and your-service-name with descriptive names
    • Replace the layer ARN with the correct one for your AWS region (see ADOT Lambda docs)
    • "xray": true is optional — it enables X-Ray alongside ADOT for co-existence
  2. Create Collector Configuration

    Create collector-config.yaml in your project root, then copy it to .chalice/ so it gets packaged with your Lambda:

    receivers:
    otlp:
    protocols:
    grpc:
    endpoint: localhost:4317
    exporters:
    otlp:
    endpoint: $last9_otlp_endpoint
    headers:
    authorization: $last9_otlp_auth_header
    tls:
    insecure: false
    service:
    pipelines:
    traces:
    receivers: [otlp]
    exporters: [otlp]
    metrics:
    receivers: [otlp]
    exporters: [otlp]

    Copy to the .chalice/ directory:

    cp collector-config.yaml .chalice/collector-config.yaml
  3. Deploy

    chalice deploy --stage dev

    Chalice packages your app code, .chalice/ directory (including collector-config.yaml), and requirements into a Lambda deployment.

  4. Test and Verify

    # Get your API URL
    chalice url --stage dev
    # Test
    curl $(chalice url --stage dev)/

Understanding the Setup

How Chalice + ADOT Works

  1. Chalice deploys your Python function to Lambda with the ADOT layer attached
  2. AWS_LAMBDA_EXEC_WRAPPER=/opt/otel-instrument wraps the Python process at startup
  3. The ADOT layer injects OpenTelemetry auto-instrumentation before your Chalice app loads
  4. All HTTP handlers, scheduled tasks, and AWS SDK calls are traced automatically
  5. The in-Lambda ADOT Collector sends traces to Last9 via the collector-config.yaml

Environment Variables Explained

VariablePurposeExample
AWS_LAMBDA_EXEC_WRAPPEREnables ADOT instrumentation/opt/otel-instrument
OPENTELEMETRY_COLLECTOR_CONFIG_FILEPath to collector config in Lambda/var/task/.chalice/collector-config.yaml
OTEL_SERVICE_NAMEService identifier in tracespayment-service
OTEL_EXPORTER_OTLP_PROTOCOLExport protocolhttp/protobuf
OTEL_TRACES_SAMPLERSampling strategyalways_on or traceidratio
OTEL_TRACES_SAMPLER_ARGSampling rate (if traceidratio)0.1 (10%)
OTEL_PROPAGATORSTrace context formatstracecontext,xray
OTEL_RESOURCE_ATTRIBUTESAdditional metadatadeployment.environment=prod

What Gets Traced

The ADOT layer automatically traces:

  • Chalice Route Handlers: @app.route() decorated functions
  • Scheduled Tasks: @app.schedule() decorated functions
  • AWS SDK Calls: DynamoDB, S3, SQS, SNS, etc.
  • HTTP Requests: Outbound API calls via urllib, requests, boto3
  • Database Calls: RDS, DynamoDB operations

X-Ray Co-existence

If your Chalice app already has "xray": true in config.json, you can keep it alongside ADOT:

  • X-Ray traces continue going to the AWS X-Ray service (existing dashboards keep working)
  • ADOT/OTLP traces go to Last9

Setting OTEL_PROPAGATORS=tracecontext,xray ensures the ADOT layer reads and writes both W3C traceparent and AWS X-Amzn-Trace-Id headers. Trace context propagates correctly regardless of which format upstream services use.

To use ADOT only (no X-Ray), remove "xray": true from config.json and set propagators to tracecontext only.

Advanced Configuration

Custom Spans via Chalice Middleware

Auto-instrumentation captures handlers and SDK calls. For custom business logic spans, use the OTel API with Chalice middleware:

from chalice import Chalice
from opentelemetry import trace
app = Chalice(app_name="your-app")
tracer = trace.get_tracer(__name__)
@app.middleware("all")
def add_custom_attributes(event, get_response):
span = trace.get_current_span()
if span.is_recording():
span.set_attribute("app.framework", "chalice")
return get_response(event)
@app.route("/process/{order_id}")
def process_order(order_id):
with tracer.start_as_current_span("process_order") as span:
span.set_attribute("order.id", order_id)
# your business logic
return {"status": "processed"}

Only opentelemetry-api is needed in requirements.txt. The ADOT layer provides the full SDK.

Per-Function Configuration

Chalice supports per-function overrides for layer, memory, and timeout:

{
"stages": {
"dev": {
"lambda_functions": {
"periodic_check": {
"lambda_timeout": 60,
"lambda_memory_size": 128
}
}
}
}
}

Sampling Configuration

Control trace sampling to manage costs:

# Development: Sample all traces
OTEL_TRACES_SAMPLER=always_on
# Production: Sample 10% of traces
OTEL_TRACES_SAMPLER=traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1

Troubleshooting

No Traces Appearing

Check CloudWatch Logs:

aws logs tail /aws/lambda/your-app-name-dev --follow

Common Issues:

  • Verify collector-config.yaml is at /var/task/.chalice/collector-config.yaml inside the Lambda
  • Confirm ADOT layer ARN is correct for your region
  • Check that Last9 credentials in collector config are valid
  • Ensure AWS_LAMBDA_EXEC_WRAPPER is set to /opt/otel-instrument

Module Errors

Do NOT add opentelemetry-sdk or opentelemetry-instrumentation-* to requirements.txt. The ADOT layer provides them. Only opentelemetry-api is needed (for custom spans).

Cold Start Latency

ADOT adds ~500ms-1s to cold starts. Mitigate with:

  • Provisioned concurrency for latency-sensitive functions
  • Adequate memory allocation (256MB+ recommended)

Error Messages

ErrorSolution
”batch processor not found”Remove batch processor from collector-config.yaml
”parse headers”Use authorization=Basic ... format (lowercase key, key=value)
“Layer not found”Use correct layer ARN for your region
”Recording is off”Set OTEL_TRACES_SAMPLER=always_on
Unable to import module 'otel_wrapper': No module named '<pkg>'Set "automatic_layer": false in .chalice/config.json — see Pip-only direct SDK
250 MB layer/zip ceiling exceededDrop boto3/botocore from requirements.txt (Lambda runtime ships them), or switch to Pip-only direct SDK

Pip-only direct SDK (no ADOT layer)

The ADOT layer approach above works for most Chalice apps, but two failure modes are worth handling explicitly:

  1. Unable to import module 'otel_wrapper': No module named '<pkg>' — when "automatic_layer": true is set in .chalice/config.json, Chalice puts your app’s pip dependencies into a separate Lambda layer at /opt/python/lib/python3.X/site-packages/<pkg>/. The ADOT wrapper script (/opt/otel-instrument) exports a PYTHONPATH=/opt/python:... and then execs a new python3 process — that new process’s sys.path does NOT auto-include the nested lib/pythonX/site-packages/ directory, so your deps become invisible.

  2. 250 MB Lambda ceiling — function code + ALL attached layers (unzipped) must stay under 250 MB. ADOT layer ~100 MB unzipped + a Chalice managed-deps layer that includes typical web app dependencies pushes many real apps over the limit.

For both, the cleanest fix is to skip the ADOT layer entirely and install the OpenTelemetry SDK + instrumentations via requirements.txt. Telemetry exports directly from the function process to Last9 over OTLP/HTTP — no in-Lambda collector subprocess, no Lambda layer attached.

Trade-off: requires a small one-time code change in app.py (one init_otel() call + Chalice middleware registration). The change is contained to a chalicelib/ module and can be reused across every Chalice Lambda in your project.

  1. Add OTel dependencies to requirements.txt

    Use opentelemetry-bootstrap to auto-detect which instrumentation packages your app needs:

    python -m venv /tmp/scan-venv
    source /tmp/scan-venv/bin/activate
    pip install -r requirements.txt opentelemetry-distro
    opentelemetry-bootstrap -a requirements
    deactivate
    rm -rf /tmp/scan-venv

    The command scans installed packages (boto3, requests, aiohttp, etc.) and prints matching opentelemetry-instrumentation-* packages. Append them to your requirements.txt. The typical Chalice Lambda needs:

    opentelemetry-api
    opentelemetry-sdk
    opentelemetry-exporter-otlp-proto-http
    opentelemetry-instrumentation-botocore
    opentelemetry-instrumentation-requests
    opentelemetry-instrumentation-urllib3
    opentelemetry-instrumentation-logging
    opentelemetry-instrumentation-aiohttp-client

    Do NOT add opentelemetry-instrumentation-aws-lambda — that instrumentor expects the Lambda handler to be a function, but Chalice’s handler is the app instance (a Chalice class). wrapt will crash with AttributeError: partially initialized module 'app' has no attribute 'app'. The middleware in step 3 provides the equivalent SERVER span.

  2. Add chalicelib/otel_init.py

    Chalice only packages app.py, the chalicelib/ tree, and resolved deps. Put shared OTel init in chalicelib/otel_init.py:

    """OTel SDK init for Lambda. Reads endpoint + auth from env vars."""
    import os
    def init_otel(service_name: str) -> None:
    # Skip locally — `chalice local` and pytest never set this.
    if not os.environ.get("AWS_LAMBDA_FUNCTION_NAME") and not os.environ.get("OTEL_FORCE_INIT"):
    return
    from opentelemetry import trace
    from opentelemetry.sdk.resources import Resource
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry.sdk.trace.export import SimpleSpanProcessor
    from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
    from opentelemetry.instrumentation.botocore import BotocoreInstrumentor
    from opentelemetry.instrumentation.requests import RequestsInstrumentor
    from opentelemetry.instrumentation.urllib3 import URLLib3Instrumentor
    from opentelemetry.instrumentation.logging import LoggingInstrumentor
    from opentelemetry.instrumentation.aiohttp_client import AioHttpClientInstrumentor
    resource = Resource.create({
    "service.name": service_name,
    "service.namespace": os.environ.get("OTEL_SERVICE_NAMESPACE", "default"),
    "deployment.environment": os.environ.get("OTEL_DEPLOYMENT_ENV", "dev"),
    })
    provider = TracerProvider(resource=resource)
    # SimpleSpanProcessor flushes synchronously per span — safer under Lambda freeze/thaw
    # than BatchSpanProcessor, which can drop its last batch on shutdown.
    provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter()))
    trace.set_tracer_provider(provider)
    BotocoreInstrumentor().instrument()
    RequestsInstrumentor().instrument()
    URLLib3Instrumentor().instrument()
    LoggingInstrumentor().instrument()
    AioHttpClientInstrumentor().instrument()

    The AWS_LAMBDA_FUNCTION_NAME guard ensures chalice local and pytest skip OTel entirely — useful since OTLP endpoint/auth env vars are typically not set in development.

  3. Add chalicelib/server_span.py (replaces AwsLambdaInstrumentor)

    Chalice middleware that opens a SERVER span per route invocation, emits HTTP + FaaS semantic conventions, and extracts W3C traceparent from incoming headers so traces can join an upstream caller’s tree:

    import os
    from typing import Any, Callable
    from opentelemetry import context as otel_context
    from opentelemetry import trace
    from opentelemetry.propagate import extract
    from opentelemetry.trace import SpanKind, Status, StatusCode
    _tracer = trace.get_tracer("chalicelib.server_span")
    def _carrier_from_event(event: Any) -> dict:
    raw = getattr(event, "raw_request", None)
    if raw is not None and hasattr(raw, "headers"):
    return {k.lower(): v for k, v in dict(raw.headers or {}).items()}
    return {}
    def server_span_middleware(event: Any, get_response: Callable) -> Any:
    ctx = getattr(event, "context", {}) or {}
    method = ctx.get("httpMethod", "GET")
    route = ctx.get("resourcePath") or ctx.get("path") or "/"
    identity = ctx.get("identity", {}) or {}
    parent_ctx = extract(_carrier_from_event(event))
    token = otel_context.attach(parent_ctx)
    try:
    with _tracer.start_as_current_span(f"{method} {route}", kind=SpanKind.SERVER) as span:
    span.set_attribute("http.request.method", method)
    span.set_attribute("http.route", route)
    span.set_attribute("url.path", route)
    span.set_attribute("url.scheme", "https")
    ua = identity.get("userAgent")
    if ua:
    span.set_attribute("user_agent.original", ua)
    client_ip = identity.get("sourceIp")
    if client_ip:
    span.set_attribute("client.address", client_ip)
    # FaaS semantic conventions
    span.set_attribute("faas.trigger", "http")
    span.set_attribute("cloud.provider", "aws")
    region = os.environ.get("AWS_REGION") or os.environ.get("AWS_DEFAULT_REGION")
    if region:
    span.set_attribute("cloud.region", region)
    fn_name = os.environ.get("AWS_LAMBDA_FUNCTION_NAME")
    if fn_name:
    span.set_attribute("faas.name", fn_name)
    xray = os.environ.get("_X_AMZN_TRACE_ID")
    if xray:
    span.set_attribute("faas.invocation_id", xray)
    try:
    response = get_response(event)
    except Exception as exc:
    span.record_exception(exc)
    span.set_status(Status(StatusCode.ERROR, str(exc)))
    raise
    status_code = getattr(response, "status_code", None)
    if status_code is not None:
    span.set_attribute("http.response.status_code", status_code)
    if status_code >= 500:
    span.set_status(Status(StatusCode.ERROR))
    return response
    finally:
    otel_context.detach(token)
  4. Wire it into app.py

    Call init_otel() BEFORE any other imports (so the instrumentors patch modules at the right time), then register the middleware on the Chalice app:

    from chalicelib.otel_init import init_otel
    init_otel(service_name="your-service-name")
    from chalice import Chalice
    from chalicelib.server_span import server_span_middleware
    app = Chalice(app_name="your-app-name")
    app.register_middleware(server_span_middleware, event_type="all")
    @app.route("/")
    def index():
    return {"ok": True}
  5. Update .chalice/config.json

    Remove all of the ADOT-layer-related fields and replace with direct-export env vars. Drop layers, AWS_LAMBDA_EXEC_WRAPPER, and OPENTELEMETRY_COLLECTOR_CONFIG_FILE:

    {
    "version": "2.0",
    "app_name": "your-app-name",
    "automatic_layer": false,
    "stages": {
    "prod": {
    "api_gateway_stage": "prod",
    "lambda_memory_size": 512,
    "lambda_timeout": 30,
    "environment_variables": {
    "OTEL_SERVICE_NAME": "your-service-name",
    "OTEL_SERVICE_NAMESPACE": "your-namespace",
    "OTEL_DEPLOYMENT_ENV": "prod",
    "OTEL_EXPORTER_OTLP_ENDPOINT": "<your_last9_otlp_endpoint>",
    "OTEL_EXPORTER_OTLP_PROTOCOL": "http/protobuf",
    "OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Basic <your-base64-credentials>"
    }
    }
    }
    }

    Replace <your_last9_otlp_endpoint> with the OTLP endpoint shown in Last9 → Integrations → OpenTelemetry (e.g. https://otlp-<region>.last9.io). Chalice does NOT expand env vars inside config.json, so the Authorization=Basic ... value must be the actual base64 string at deploy time — keep this file out of source control by adding .chalice/config.json to .gitignore and generating it from a template (config.json.in + sed) at deploy time.

  6. Deploy

    chalice deploy --stage prod
  7. Verify

    Hit your function, then in Last9 → Traces filter by service.name=your-service-name. You should see a SERVER span as the root with CLIENT spans (boto3, requests, etc.) as children:

    SERVER GET /your-route (chalicelib.server_span)
    ├── CLIENT DynamoDB.GetItem (opentelemetry.instrumentation.botocore)
    └── CLIENT GET https://api.example.com (opentelemetry.instrumentation.requests)

When to use this path

Use the pip-only direct SDK path when:

  • Your function zip + layers (unzipped) approach or exceed 250 MB
  • You hit Unable to import module 'otel_wrapper': No module named '<pkg>' and prefer a structural fix over keeping automatic_layer: false permanently
  • You want a single deployment artifact (no Lambda layer ARNs to manage across environments)
  • You need explicit control over which OTel instrumentation packages run (security review, dep audit)

Stick with the ADOT layer path when:

  • You want zero application code changes
  • Your function deps are small enough that the 250 MB ceiling is comfortable headroom
  • You rely on AwsLambdaInstrumentor’s built-in trigger detection for non-API-Gateway events (S3, SNS, EventBridge, SQS) — though even there, a small custom span around the handler covers the common cases

Reference example

A complete five-variant working example (including chalicelib/otel_init.py and chalicelib/server_span.py shown above) lives in the opentelemetry-examples repo. Variants A-C demonstrate the automatic_layer failure modes; Variant E is the working pattern recommended here.

Best Practices

  • Service Naming: Use descriptive, consistent names across Chalice stages
  • Sampling: Start with always_on in dev, use traceidratio in production
  • X-Ray Transition: Keep X-Ray enabled initially, disable once Last9 dashboards are confirmed
  • Memory: Allocate 256MB+ to account for ADOT layer overhead
  • Collector Config: Always copy collector-config.yaml to .chalice/ before deploying

Need Help?

If you encounter any issues or have questions: