AWS Chalice
Instrument AWS Chalice Lambda functions with OpenTelemetry using ADOT layers for automatic tracing and observability
Use OpenTelemetry to instrument your AWS Chalice Lambda functions and send telemetry data to Last9. Chalice is AWS’s Python serverless framework that manages Lambda deployment, API Gateway, and IAM policies through a single config file. This integration uses the AWS Distro for OpenTelemetry (ADOT) Layer for automatic instrumentation with no code changes required.
Prerequisites
Before setting up AWS Chalice monitoring, ensure you have:
- AWS Account: With access to Lambda service
- Python 3.8+: With Chalice installed (
pip install chalice) - Chalice Project: An existing or new Chalice application
- Last9 Account: With OpenTelemetry integration credentials
Supported Runtimes
Chalice deploys Python Lambda functions. ADOT Python layers support:
- Python: 3.8, 3.9, 3.10, 3.11, 3.12
-
Configure
.chalice/config.jsonAdd the ADOT Lambda layer and environment variables to your Chalice configuration. The layer provides auto-instrumentation — no application code changes needed.
{"version": "2.0","app_name": "your-app-name","stages": {"dev": {"api_gateway_stage": "dev","lambda_timeout": 30,"lambda_memory_size": 256,"xray": true,"layers": ["arn:aws:lambda:ap-southeast-1:901920570463:layer:aws-otel-python-amd64-ver-1-25-0:1"],"environment_variables": {"AWS_LAMBDA_EXEC_WRAPPER": "/opt/otel-instrument","OPENTELEMETRY_COLLECTOR_CONFIG_FILE": "/var/task/.chalice/collector-config.yaml","OTEL_SERVICE_NAME": "your-service-name","OTEL_PROPAGATORS": "tracecontext,xray","OTEL_EXPORTER_OTLP_PROTOCOL": "http/protobuf","OTEL_TRACES_EXPORTER": "otlp","OTEL_TRACES_SAMPLER": "always_on","OTEL_RESOURCE_ATTRIBUTES": "deployment.environment=dev"}}}}{"version": "2.0","app_name": "your-app-name","stages": {"prod": {"api_gateway_stage": "prod","lambda_timeout": 30,"lambda_memory_size": 512,"xray": true,"layers": ["arn:aws:lambda:ap-southeast-1:901920570463:layer:aws-otel-python-amd64-ver-1-25-0:1"],"environment_variables": {"AWS_LAMBDA_EXEC_WRAPPER": "/opt/otel-instrument","OPENTELEMETRY_COLLECTOR_CONFIG_FILE": "/var/task/.chalice/collector-config.yaml","OTEL_SERVICE_NAME": "your-service-name","OTEL_PROPAGATORS": "tracecontext,xray","OTEL_EXPORTER_OTLP_PROTOCOL": "http/protobuf","OTEL_TRACES_EXPORTER": "otlp","OTEL_TRACES_SAMPLER": "traceidratio","OTEL_TRACES_SAMPLER_ARG": "0.1","OTEL_RESOURCE_ATTRIBUTES": "deployment.environment=prod"}}}}Production uses
traceidratiosampler at 10% to control costs. AdjustOTEL_TRACES_SAMPLER_ARGas needed.Important Configuration Notes:
- Replace
your-app-nameandyour-service-namewith descriptive names - Replace the layer ARN with the correct one for your AWS region (see ADOT Lambda docs)
"xray": trueis optional — it enables X-Ray alongside ADOT for co-existence
- Replace
-
Create Collector Configuration
Create
collector-config.yamlin your project root, then copy it to.chalice/so it gets packaged with your Lambda:receivers:otlp:protocols:grpc:endpoint: localhost:4317exporters:otlp:endpoint: $last9_otlp_endpointheaders:authorization: $last9_otlp_auth_headertls:insecure: falseservice:pipelines:traces:receivers: [otlp]exporters: [otlp]metrics:receivers: [otlp]exporters: [otlp]Copy to the
.chalice/directory:cp collector-config.yaml .chalice/collector-config.yaml -
Deploy
chalice deploy --stage devchalice deploy --stage prodChalice packages your app code,
.chalice/directory (including collector-config.yaml), and requirements into a Lambda deployment. -
Test and Verify
# Get your API URLchalice url --stage dev# Testcurl $(chalice url --stage dev)/aws lambda invoke \--function-name your-app-name-dev \--payload '{"test": "event"}' \response.jsoncat response.json
Understanding the Setup
How Chalice + ADOT Works
- Chalice deploys your Python function to Lambda with the ADOT layer attached
AWS_LAMBDA_EXEC_WRAPPER=/opt/otel-instrumentwraps the Python process at startup- The ADOT layer injects OpenTelemetry auto-instrumentation before your Chalice app loads
- All HTTP handlers, scheduled tasks, and AWS SDK calls are traced automatically
- The in-Lambda ADOT Collector sends traces to Last9 via the collector-config.yaml
Environment Variables Explained
| Variable | Purpose | Example |
|---|---|---|
AWS_LAMBDA_EXEC_WRAPPER | Enables ADOT instrumentation | /opt/otel-instrument |
OPENTELEMETRY_COLLECTOR_CONFIG_FILE | Path to collector config in Lambda | /var/task/.chalice/collector-config.yaml |
OTEL_SERVICE_NAME | Service identifier in traces | payment-service |
OTEL_EXPORTER_OTLP_PROTOCOL | Export protocol | http/protobuf |
OTEL_TRACES_SAMPLER | Sampling strategy | always_on or traceidratio |
OTEL_TRACES_SAMPLER_ARG | Sampling rate (if traceidratio) | 0.1 (10%) |
OTEL_PROPAGATORS | Trace context formats | tracecontext,xray |
OTEL_RESOURCE_ATTRIBUTES | Additional metadata | deployment.environment=prod |
What Gets Traced
The ADOT layer automatically traces:
- Chalice Route Handlers:
@app.route()decorated functions - Scheduled Tasks:
@app.schedule()decorated functions - AWS SDK Calls: DynamoDB, S3, SQS, SNS, etc.
- HTTP Requests: Outbound API calls via
urllib,requests,boto3 - Database Calls: RDS, DynamoDB operations
X-Ray Co-existence
If your Chalice app already has "xray": true in config.json, you can keep it alongside ADOT:
- X-Ray traces continue going to the AWS X-Ray service (existing dashboards keep working)
- ADOT/OTLP traces go to Last9
Setting OTEL_PROPAGATORS=tracecontext,xray ensures the ADOT layer reads and writes both W3C traceparent and AWS X-Amzn-Trace-Id headers. Trace context propagates correctly regardless of which format upstream services use.
To use ADOT only (no X-Ray), remove "xray": true from config.json and set propagators to tracecontext only.
Advanced Configuration
Custom Spans via Chalice Middleware
Auto-instrumentation captures handlers and SDK calls. For custom business logic spans, use the OTel API with Chalice middleware:
from chalice import Chalicefrom opentelemetry import trace
app = Chalice(app_name="your-app")tracer = trace.get_tracer(__name__)
@app.middleware("all")def add_custom_attributes(event, get_response): span = trace.get_current_span() if span.is_recording(): span.set_attribute("app.framework", "chalice") return get_response(event)
@app.route("/process/{order_id}")def process_order(order_id): with tracer.start_as_current_span("process_order") as span: span.set_attribute("order.id", order_id) # your business logic return {"status": "processed"}Only opentelemetry-api is needed in requirements.txt. The ADOT layer provides the full SDK.
Per-Function Configuration
Chalice supports per-function overrides for layer, memory, and timeout:
{ "stages": { "dev": { "lambda_functions": { "periodic_check": { "lambda_timeout": 60, "lambda_memory_size": 128 } } } }}Sampling Configuration
Control trace sampling to manage costs:
# Development: Sample all tracesOTEL_TRACES_SAMPLER=always_on
# Production: Sample 10% of tracesOTEL_TRACES_SAMPLER=traceidratioOTEL_TRACES_SAMPLER_ARG=0.1Troubleshooting
No Traces Appearing
Check CloudWatch Logs:
aws logs tail /aws/lambda/your-app-name-dev --followCommon Issues:
- Verify
collector-config.yamlis at/var/task/.chalice/collector-config.yamlinside the Lambda - Confirm ADOT layer ARN is correct for your region
- Check that Last9 credentials in collector config are valid
- Ensure
AWS_LAMBDA_EXEC_WRAPPERis set to/opt/otel-instrument
Module Errors
Do NOT add opentelemetry-sdk or opentelemetry-instrumentation-* to requirements.txt. The ADOT layer provides them. Only opentelemetry-api is needed (for custom spans).
Cold Start Latency
ADOT adds ~500ms-1s to cold starts. Mitigate with:
- Provisioned concurrency for latency-sensitive functions
- Adequate memory allocation (256MB+ recommended)
Error Messages
| Error | Solution |
|---|---|
| ”batch processor not found” | Remove batch processor from collector-config.yaml |
| ”parse headers” | Use authorization=Basic ... format (lowercase key, key=value) |
| “Layer not found” | Use correct layer ARN for your region |
| ”Recording is off” | Set OTEL_TRACES_SAMPLER=always_on |
Unable to import module 'otel_wrapper': No module named '<pkg>' | Set "automatic_layer": false in .chalice/config.json — see Pip-only direct SDK |
| 250 MB layer/zip ceiling exceeded | Drop boto3/botocore from requirements.txt (Lambda runtime ships them), or switch to Pip-only direct SDK |
Pip-only direct SDK (no ADOT layer)
The ADOT layer approach above works for most Chalice apps, but two failure modes are worth handling explicitly:
-
Unable to import module 'otel_wrapper': No module named '<pkg>'— when"automatic_layer": trueis set in.chalice/config.json, Chalice puts your app’s pip dependencies into a separate Lambda layer at/opt/python/lib/python3.X/site-packages/<pkg>/. The ADOT wrapper script (/opt/otel-instrument) exports aPYTHONPATH=/opt/python:...and thenexecs a newpython3process — that new process’ssys.pathdoes NOT auto-include the nestedlib/pythonX/site-packages/directory, so your deps become invisible. -
250 MB Lambda ceiling — function code + ALL attached layers (unzipped) must stay under 250 MB. ADOT layer ~100 MB unzipped + a Chalice managed-deps layer that includes typical web app dependencies pushes many real apps over the limit.
For both, the cleanest fix is to skip the ADOT layer entirely and install the OpenTelemetry SDK + instrumentations via requirements.txt. Telemetry exports directly from the function process to Last9 over OTLP/HTTP — no in-Lambda collector subprocess, no Lambda layer attached.
Trade-off: requires a small one-time code change in app.py (one init_otel() call + Chalice middleware registration). The change is contained to a chalicelib/ module and can be reused across every Chalice Lambda in your project.
-
Add OTel dependencies to
requirements.txtUse
opentelemetry-bootstrapto auto-detect which instrumentation packages your app needs:python -m venv /tmp/scan-venvsource /tmp/scan-venv/bin/activatepip install -r requirements.txt opentelemetry-distroopentelemetry-bootstrap -a requirementsdeactivaterm -rf /tmp/scan-venvThe command scans installed packages (
boto3,requests,aiohttp, etc.) and prints matchingopentelemetry-instrumentation-*packages. Append them to yourrequirements.txt. The typical Chalice Lambda needs:opentelemetry-apiopentelemetry-sdkopentelemetry-exporter-otlp-proto-httpopentelemetry-instrumentation-botocoreopentelemetry-instrumentation-requestsopentelemetry-instrumentation-urllib3opentelemetry-instrumentation-loggingopentelemetry-instrumentation-aiohttp-clientDo NOT add
opentelemetry-instrumentation-aws-lambda— that instrumentor expects the Lambda handler to be a function, but Chalice’s handler is theappinstance (a Chalice class).wraptwill crash withAttributeError: partially initialized module 'app' has no attribute 'app'. The middleware in step 3 provides the equivalent SERVER span. -
Add
chalicelib/otel_init.pyChalice only packages
app.py, thechalicelib/tree, and resolved deps. Put shared OTel init inchalicelib/otel_init.py:"""OTel SDK init for Lambda. Reads endpoint + auth from env vars."""import osdef init_otel(service_name: str) -> None:# Skip locally — `chalice local` and pytest never set this.if not os.environ.get("AWS_LAMBDA_FUNCTION_NAME") and not os.environ.get("OTEL_FORCE_INIT"):returnfrom opentelemetry import tracefrom opentelemetry.sdk.resources import Resourcefrom opentelemetry.sdk.trace import TracerProviderfrom opentelemetry.sdk.trace.export import SimpleSpanProcessorfrom opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporterfrom opentelemetry.instrumentation.botocore import BotocoreInstrumentorfrom opentelemetry.instrumentation.requests import RequestsInstrumentorfrom opentelemetry.instrumentation.urllib3 import URLLib3Instrumentorfrom opentelemetry.instrumentation.logging import LoggingInstrumentorfrom opentelemetry.instrumentation.aiohttp_client import AioHttpClientInstrumentorresource = Resource.create({"service.name": service_name,"service.namespace": os.environ.get("OTEL_SERVICE_NAMESPACE", "default"),"deployment.environment": os.environ.get("OTEL_DEPLOYMENT_ENV", "dev"),})provider = TracerProvider(resource=resource)# SimpleSpanProcessor flushes synchronously per span — safer under Lambda freeze/thaw# than BatchSpanProcessor, which can drop its last batch on shutdown.provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter()))trace.set_tracer_provider(provider)BotocoreInstrumentor().instrument()RequestsInstrumentor().instrument()URLLib3Instrumentor().instrument()LoggingInstrumentor().instrument()AioHttpClientInstrumentor().instrument()The
AWS_LAMBDA_FUNCTION_NAMEguard ensureschalice localandpytestskip OTel entirely — useful since OTLP endpoint/auth env vars are typically not set in development. -
Add
chalicelib/server_span.py(replacesAwsLambdaInstrumentor)Chalice middleware that opens a SERVER span per route invocation, emits HTTP + FaaS semantic conventions, and extracts W3C
traceparentfrom incoming headers so traces can join an upstream caller’s tree:import osfrom typing import Any, Callablefrom opentelemetry import context as otel_contextfrom opentelemetry import tracefrom opentelemetry.propagate import extractfrom opentelemetry.trace import SpanKind, Status, StatusCode_tracer = trace.get_tracer("chalicelib.server_span")def _carrier_from_event(event: Any) -> dict:raw = getattr(event, "raw_request", None)if raw is not None and hasattr(raw, "headers"):return {k.lower(): v for k, v in dict(raw.headers or {}).items()}return {}def server_span_middleware(event: Any, get_response: Callable) -> Any:ctx = getattr(event, "context", {}) or {}method = ctx.get("httpMethod", "GET")route = ctx.get("resourcePath") or ctx.get("path") or "/"identity = ctx.get("identity", {}) or {}parent_ctx = extract(_carrier_from_event(event))token = otel_context.attach(parent_ctx)try:with _tracer.start_as_current_span(f"{method} {route}", kind=SpanKind.SERVER) as span:span.set_attribute("http.request.method", method)span.set_attribute("http.route", route)span.set_attribute("url.path", route)span.set_attribute("url.scheme", "https")ua = identity.get("userAgent")if ua:span.set_attribute("user_agent.original", ua)client_ip = identity.get("sourceIp")if client_ip:span.set_attribute("client.address", client_ip)# FaaS semantic conventionsspan.set_attribute("faas.trigger", "http")span.set_attribute("cloud.provider", "aws")region = os.environ.get("AWS_REGION") or os.environ.get("AWS_DEFAULT_REGION")if region:span.set_attribute("cloud.region", region)fn_name = os.environ.get("AWS_LAMBDA_FUNCTION_NAME")if fn_name:span.set_attribute("faas.name", fn_name)xray = os.environ.get("_X_AMZN_TRACE_ID")if xray:span.set_attribute("faas.invocation_id", xray)try:response = get_response(event)except Exception as exc:span.record_exception(exc)span.set_status(Status(StatusCode.ERROR, str(exc)))raisestatus_code = getattr(response, "status_code", None)if status_code is not None:span.set_attribute("http.response.status_code", status_code)if status_code >= 500:span.set_status(Status(StatusCode.ERROR))return responsefinally:otel_context.detach(token) -
Wire it into
app.pyCall
init_otel()BEFORE any other imports (so the instrumentors patch modules at the right time), then register the middleware on the Chalice app:from chalicelib.otel_init import init_otelinit_otel(service_name="your-service-name")from chalice import Chalicefrom chalicelib.server_span import server_span_middlewareapp = Chalice(app_name="your-app-name")app.register_middleware(server_span_middleware, event_type="all")@app.route("/")def index():return {"ok": True} -
Update
.chalice/config.jsonRemove all of the ADOT-layer-related fields and replace with direct-export env vars. Drop
layers,AWS_LAMBDA_EXEC_WRAPPER, andOPENTELEMETRY_COLLECTOR_CONFIG_FILE:{"version": "2.0","app_name": "your-app-name","automatic_layer": false,"stages": {"prod": {"api_gateway_stage": "prod","lambda_memory_size": 512,"lambda_timeout": 30,"environment_variables": {"OTEL_SERVICE_NAME": "your-service-name","OTEL_SERVICE_NAMESPACE": "your-namespace","OTEL_DEPLOYMENT_ENV": "prod","OTEL_EXPORTER_OTLP_ENDPOINT": "<your_last9_otlp_endpoint>","OTEL_EXPORTER_OTLP_PROTOCOL": "http/protobuf","OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Basic <your-base64-credentials>"}}}}Replace
<your_last9_otlp_endpoint>with the OTLP endpoint shown in Last9 → Integrations → OpenTelemetry (e.g.https://otlp-<region>.last9.io). Chalice does NOT expand env vars insideconfig.json, so theAuthorization=Basic ...value must be the actual base64 string at deploy time — keep this file out of source control by adding.chalice/config.jsonto.gitignoreand generating it from a template (config.json.in+sed) at deploy time. -
Deploy
chalice deploy --stage prod -
Verify
Hit your function, then in Last9 → Traces filter by
service.name=your-service-name. You should see a SERVER span as the root with CLIENT spans (boto3, requests, etc.) as children:SERVER GET /your-route (chalicelib.server_span)├── CLIENT DynamoDB.GetItem (opentelemetry.instrumentation.botocore)└── CLIENT GET https://api.example.com (opentelemetry.instrumentation.requests)
When to use this path
Use the pip-only direct SDK path when:
- Your function zip + layers (unzipped) approach or exceed 250 MB
- You hit
Unable to import module 'otel_wrapper': No module named '<pkg>'and prefer a structural fix over keepingautomatic_layer: falsepermanently - You want a single deployment artifact (no Lambda layer ARNs to manage across environments)
- You need explicit control over which OTel instrumentation packages run (security review, dep audit)
Stick with the ADOT layer path when:
- You want zero application code changes
- Your function deps are small enough that the 250 MB ceiling is comfortable headroom
- You rely on
AwsLambdaInstrumentor’s built-in trigger detection for non-API-Gateway events (S3, SNS, EventBridge, SQS) — though even there, a small custom span around the handler covers the common cases
Reference example
A complete five-variant working example (including chalicelib/otel_init.py and chalicelib/server_span.py shown above) lives in the opentelemetry-examples repo. Variants A-C demonstrate the automatic_layer failure modes; Variant E is the working pattern recommended here.
Best Practices
- Service Naming: Use descriptive, consistent names across Chalice stages
- Sampling: Start with
always_onin dev, usetraceidratioin production - X-Ray Transition: Keep X-Ray enabled initially, disable once Last9 dashboards are confirmed
- Memory: Allocate 256MB+ to account for ADOT layer overhead
- Collector Config: Always copy collector-config.yaml to
.chalice/before deploying
Need Help?
If you encounter any issues or have questions:
- Join our Discord community for real-time support
- Contact our support team at support@last9.io