Cerebrium

Send resource, GPU, and execution metrics from your Cerebrium serverless GPU applications to Last9. Cerebrium pushes metrics natively over OTLP/HTTP every 60 seconds — no SDK code, no sidecar, just dashboard configuration.

What is Cerebrium?

Cerebrium is a serverless GPU infrastructure platform for real-time AI workloads. It autoscales containers in 1–3 seconds and bills per second of compute. Cerebrium has built-in OpenTelemetry metric export to any OTLP/HTTP backend.

Prerequisites

Last9 Account — Sign up at app.last9.io
Cerebrium Account — With at least one deployed application

Integration Setup

Get Your Last9 OTLP Endpoint and Auth Header

Navigate to Integrations → OpenTelemetry in your Last9 dashboard. Copy the OTLP Endpoint (the base URL — for example, https://otlp-aps1.last9.io) and the Auth Header value (a string starting with Basic).
Open Cerebrium’s Metrics Export Settings

In the Cerebrium dashboard, go to Integrations → Metrics Export, then select Custom OTLP.

Configure the Endpoint and Auth Header

Enter the following values:

Field	Value
OTLP Endpoint	`https://otlp-aps1.last9.io` (base URL — see note below)
Auth Header Name	`Authorization`
Auth Header Value	`Basic <token>` (paste the full Auth Header from Last9, including the `Basic` prefix with a literal space)

Test the Connection

Click Test Connection in the Cerebrium dashboard. Once it passes, metrics begin flowing within ~60 seconds.

What Gets Exported

Cerebrium emits the following metrics every 60 seconds, labelled with project_id, app_id, app_name, and region.

Resource Metrics

Metric	Type	Description
`cerebrium_cpu_utilization_cores`	Gauge	CPU cores actively in use per app
`cerebrium_memory_usage_bytes`	Gauge	Memory actively in use per app
`cerebrium_gpu_memory_usage_bytes`	Gauge	GPU VRAM in use per app
`cerebrium_gpu_compute_utilization_percent`	Gauge	GPU compute utilization (0–100) per app
`cerebrium_containers_running_count`	Gauge	Number of running containers per app
`cerebrium_containers_ready_count`	Gauge	Number of ready containers per app

Execution Metrics

Metric	Type	Description
`cerebrium_run_execution_time_ms`	Histogram	Time spent executing user code
`cerebrium_run_queue_time_ms`	Histogram	Time spent waiting in queue
`cerebrium_run_coldstart_time_ms`	Histogram	Time for container cold start
`cerebrium_run_response_time_ms`	Histogram	Total end-to-end response time
`cerebrium_run_total`	Counter	Total run count
`cerebrium_run_successes_total`	Counter	Successful run count
`cerebrium_run_errors_total`	Counter	Failed run count

Viewing Metrics in Last9

Open Metrics Explorer and filter by app_name or project_id. Useful starter queries:

# GPU utilization per app
avg by (app_name) (cerebrium_gpu_compute_utilization_percent)

# p95 execution latency
histogram_quantile(0.95, sum by (le, app_name) (rate(cerebrium_run_execution_time_ms_milliseconds_bucket[5m])))

# Error rate per app
sum by (app_name) (rate(cerebrium_run_errors_total[5m]))
  / sum by (app_name) (rate(cerebrium_run_total[5m]))

# Running container count
sum by (app_name) (cerebrium_containers_running_count)

Sending Traces

Cerebrium’s platform-side OTLP push covers metrics only. To get application traces (per-request spans, downstream HTTP calls, LLM provider latency) into Last9, run opentelemetry-instrument in your Cerebrium entrypoint — no code changes required in your application.

Add the OTel packages to [cerebrium.dependencies.pip] in your cerebrium.toml.
```
[cerebrium.dependencies.pip]
opentelemetry-distro = ">=0.51b0"
opentelemetry-exporter-otlp = ">=1.30.0"
opentelemetry-semantic-conventions = ">=0.51b0"
```
Pin all three together. Without consistent versions, the newer exporter imports semconv._incubating against an older semantic-conventions wheel and startup fails with ModuleNotFoundError.
Install instrumentors during build via shell_commands.
```
[cerebrium.deployment]
shell_commands = ["opentelemetry-bootstrap --action=install"]
```
This detects every supported library in your pip deps (FastAPI, requests, httpx, OpenAI, Anthropic, SQLAlchemy, etc.) and installs the matching opentelemetry-instrumentation-* packages.

Wrap your entrypoint with opentelemetry-instrument.

[cerebrium.runtime.custom]
port = 8000
entrypoint = [
  "opentelemetry-instrument",
  "uvicorn", "main:app",
  "--host", "0.0.0.0", "--port", "8000",
]

Set OTel env vars as Cerebrium secrets.

cerebrium secrets add OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-aps1.last9.io
cerebrium secrets add OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
cerebrium secrets add OTEL_SERVICE_NAME=my-cerebrium-app
cerebrium secrets add 'OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <your-token>'

Note the inner = in OTEL_EXPORTER_OTLP_HEADERS — that is the OTel env-var convention for <header-name>=<header-value>, not a typo.

Deploy. Traces appear in Traces Explorer within seconds, filtered by service.name.

A runnable, verified example is available at last9/opentelemetry-examples — python/cerebrium/.

Troubleshooting

401 / 403 Unauthorized — The Auth Header value is malformed. Use a literal space between Basic and the token (not %20, not =). Confirm Name is Authorization and Value starts with Basic .
404 Endpoint not found — The endpoint URL has /v1/metrics appended. Remove it; Cerebrium appends it automatically.
Unknown export error — Often produced when the endpoint resolves but the path does not match Last9’s /v1/metrics route. Confirm you are using the base URL from your Last9 OpenTelemetry integration page and that no extra suffix or trailing slash is present.
No metrics arriving after Test Connection passes — Metrics push every 60 seconds. Wait ~2 minutes, then filter by app_name in Metrics Explorer.
ModuleNotFoundError: No module named 'opentelemetry.semconv._incubating' at container startup — Version skew between opentelemetry-exporter-otlp-proto-common and opentelemetry-semantic-conventions. Pin all three OTel packages together as shown in step 1 above.
Failed to auto initialize opentelemetry in container logs — Auto-instrumentation crashed at sitecustomize. Check Cerebrium build logs for the opentelemetry-bootstrap output and confirm instrumentors installed cleanly.
App starts and serves traffic, but no traces in Last9 — Either the OTel env-var secrets are unset on this app, or the auth header is malformed. Verify with cerebrium secrets list that all four OTEL_* secrets are present, and that OTEL_EXPORTER_OTLP_HEADERS uses the Authorization=Basic <token> form (not just the token).

Please get in touch with us on Discord or Email if you have any questions.