Braintrust
Send the same OpenTelemetry traces to both Braintrust (LLM eval scores) and Last9 (full trace and APM observability) from Python and Node applications
Send the same OpenTelemetry traces to both Braintrust (for LLM eval scores) and Last9 (for full trace and APM observability), with matching trace IDs across both backends.
What is Braintrust?
Braintrust is an LLM evaluation platform — datasets, scorers, prompt experiments, and a UI for inspecting LLM call traces. It accepts OTel traces over OTLP/HTTP and recognizes a braintrust.* attribute namespace for evaluation-specific data (scorers, ground truth, metadata).
Dual-export buys you a single trace tree with two readers: Braintrust for eval-quality investigations (which scorer fired, what was the expected output, what was the input dataset row), Last9 for end-to-end app observability (latency, errors, infra correlation, alerting). Identical trace IDs let you jump between the two views on the same request.
There are two architecturally distinct ways to ship the same span twice:
- Direct mode — two
SpanProcessorinstances on a singleTracerProvider, no Collector required. - Collector mode — the app emits OTLP/HTTP to a local OpenTelemetry Collector, which fans out to both backends via two named
otlp_httpexporters.
Both patterns are demonstrated below in Python and Node.
Prerequisites
- Last9 Account — Sign up at app.last9.io and grab your OTLP endpoint and auth header from Integrations → OpenTelemetry.
- Braintrust account — Get an API key from braintrust.dev/app/settings/api-keys and create a project (the example uses
last9-otel-example). - OpenAI API key — Used by the demo workload in the runnable examples.
- Python 3.10+ or Node.js 18+.
The complete runnable examples live in the last9/opentelemetry-examples repository:
python/braintrust-directpython/braintrust-collectorjavascript/braintrust-directjavascript/braintrust-collector
Direct mode — two SpanProcessors, no Collector
In this mode the app owns the OTel TracerProvider and attaches two SpanProcessor instances:
BraintrustSpanProcessor(frombraintrust[otel]/@braintrust/otel) routes spans to Braintrust and stamps the requiredx-bt-parentrouting header.BatchSpanProcessor(OTLPSpanExporter)ships the same spans to Last9 over OTLP/HTTP.
Both backends receive identical trace IDs because both processors observe the same span objects.
-
Install dependencies
pip install "braintrust[otel]" openai \opentelemetry-api opentelemetry-sdk \opentelemetry-exporter-otlp-proto-httpnpm install @braintrust/otel openai \@opentelemetry/api @opentelemetry/sdk-trace-node \@opentelemetry/sdk-trace-base @opentelemetry/resources \@opentelemetry/exporter-trace-otlp-http \@opentelemetry/semantic-conventions dotenv -
Set environment variables
# Service identityexport OTEL_SERVICE_NAME=braintrust-direct-exampleexport DEPLOYMENT_ENV=local# Last9 — base URL only, no /v1/traces pathexport LAST9_OTLP_ENDPOINT=<your-last9-otlp-endpoint>export LAST9_OTLP_AUTH="Basic <your-last9-credentials>"# Braintrustexport BRAINTRUST_API_KEY=<your-braintrust-api-key>export BRAINTRUST_PARENT="project_name:last9-otel-example"# Workloadexport OPENAI_API_KEY=<your-openai-api-key> -
Initialize the TracerProvider with two processors
import osfrom braintrust.otel import BraintrustSpanProcessorfrom opentelemetry import tracefrom opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporterfrom opentelemetry.sdk.resources import SERVICE_NAME, Resourcefrom opentelemetry.sdk.trace import TracerProviderfrom opentelemetry.sdk.trace.export import BatchSpanProcessorresource = Resource.create({SERVICE_NAME: os.environ["OTEL_SERVICE_NAME"],"deployment.environment": os.environ.get("DEPLOYMENT_ENV", "local"),})provider = TracerProvider(resource=resource)# Braintrust: reads BRAINTRUST_API_KEY, BRAINTRUST_PARENT, BRAINTRUST_API_URL.provider.add_span_processor(BraintrustSpanProcessor())# Last9: explicit endpoint + headers.last9_endpoint = os.environ["LAST9_OTLP_ENDPOINT"].rstrip("/")provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint=f"{last9_endpoint}/v1/traces",headers={"Authorization": os.environ["LAST9_OTLP_AUTH"]},)))trace.set_tracer_provider(provider)require('dotenv').config();const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');const { BatchSpanProcessor } = require('@opentelemetry/sdk-trace-base');const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');const { resourceFromAttributes } = require('@opentelemetry/resources');const { ATTR_SERVICE_NAME } = require('@opentelemetry/semantic-conventions');const { BraintrustSpanProcessor } = require('@braintrust/otel');const provider = new NodeTracerProvider({resource: resourceFromAttributes({[ATTR_SERVICE_NAME]: process.env.OTEL_SERVICE_NAME,'deployment.environment': process.env.DEPLOYMENT_ENV || 'local',}),spanProcessors: [new BraintrustSpanProcessor(),new BatchSpanProcessor(new OTLPTraceExporter({url: `${process.env.LAST9_OTLP_ENDPOINT.replace(/\/$/, '')}/v1/traces`,headers: { Authorization: process.env.LAST9_OTLP_AUTH },})),],});provider.register(); -
Flush before exit
BatchSpanProcessorbuffers spans for up to five seconds. A short-lived script that exits before the flush timer drops every span silently. Callforce_flushandshutdownin your script’s exit path:provider.force_flush()provider.shutdown()await provider.forceFlush().catch((err) => console.error('Flush failed:', err));await provider.shutdown().catch((err) => console.error('Shutdown failed:', err));
Collector mode — fan-out at the OpenTelemetry Collector
In this mode the app emits OTLP/HTTP to a single local Collector. The Collector’s trace pipeline declares two otlp_http exporters and routes every span to both.
This pattern keeps app code vendor-agnostic — adding or removing backends only requires a Collector restart, not an app redeploy. Centralized policy (redaction, filtering, attribute renaming) lives in one YAML.
-
Configure environment variables for the Collector
# App → local collectorOTEL_SERVICE_NAME=braintrust-collector-exampleOTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318OTEL_EXPORTER_OTLP_PROTOCOL=http/protobufDEPLOYMENT_ENV=local# Last9 — used by the Collector exporterLAST9_OTLP_ENDPOINT=<your-last9-otlp-endpoint>LAST9_OTLP_AUTH="Basic <your-last9-credentials>"# Braintrust — used by the Collector exporterBRAINTRUST_API_KEY=<your-braintrust-api-key>BRAINTRUST_PROJECT=last9-otel-example# WorkloadOPENAI_API_KEY=<your-openai-api-key> -
Write the Collector config
# otel-collector-config.yamlreceivers:otlp:protocols:http:endpoint: 0.0.0.0:4318grpc:endpoint: 0.0.0.0:4317processors:batch:timeout: 5ssend_batch_size: 512exporters:otlp_http/braintrust:endpoint: https://api.braintrust.dev/otelheaders:Authorization: "Bearer ${env:BRAINTRUST_API_KEY}"x-bt-parent: "project_name:${env:BRAINTRUST_PROJECT}"otlp_http/last9:endpoint: "${env:LAST9_OTLP_ENDPOINT}"headers:Authorization: "${env:LAST9_OTLP_AUTH}"service:pipelines:traces:receivers: [otlp]processors: [batch]exporters: [otlp_http/braintrust, otlp_http/last9] -
Run the Collector
# docker-compose.yamlservices:otel-collector:image: otel/opentelemetry-collector-contrib:0.144.0command: ["--config", "/etc/otelcol-contrib/config.yaml"]volumes:- ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yamlports:- "4317:4317"- "4318:4318"env_file:- .envStart it:
docker compose up otel-collector. -
Point the app at the Collector
The app uses a single
OTLPSpanExporter(Python) orOTLPTraceExporter(Node) with no constructor arguments — it readsOTEL_EXPORTER_OTLP_ENDPOINTfrom the environment.provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))spanProcessors: [new BatchSpanProcessor(new OTLPTraceExporter()),],
Span attributes
LLM call spans
The example workload sets the OTel GenAI semantic-convention attributes Last9 and Braintrust both understand:
| Attribute | Source |
|---|---|
gen_ai.system | LLM provider (e.g. openai) |
gen_ai.request.model | Model requested (e.g. gpt-4o-mini) |
gen_ai.operation.name | Operation type (chat, embedding, etc.) |
gen_ai.usage.input_tokens | Prompt tokens billed |
gen_ai.usage.output_tokens | Completion tokens billed |
gen_ai.response.id | Provider-assigned response ID |
gen_ai.response.model | Model that actually served (can differ from requested during version rollouts) |
gen_ai.response.finish_reasons | Array of finish reasons |
Prompts and completions are recorded as span events with the prompt/completion JSON payload as an attribute on the event:
| Span event | Attribute |
|---|---|
gen_ai.content.prompt | gen_ai.prompt (JSON-stringified messages) |
gen_ai.content.completion | gen_ai.completion (JSON-stringified message) |
Eval and score spans (braintrust.*)
Braintrust models eval runs and scorer outputs as dedicated span types. When you use the OTLP path (no Braintrust SDK on the data plane), the braintrust.span_attributes discriminator turns a regular OTel span into an eval, task, llm, tool, function, or score span on ingest:
| Attribute | Type | Purpose |
|---|---|---|
braintrust.span_attributes | JSON-stringified object {"name":..., "type":"score"|"eval"|"task"|"llm"|"tool"|"function"} | Span-type discriminator |
braintrust.scores | JSON-stringified {"scorer_name": 0.0–1.0} | Scorer outputs (one or more) |
braintrust.input | Raw string or JSON | The input the scorer evaluated |
braintrust.output | Raw string or JSON | The actual output |
braintrust.expected | Raw string or JSON | Ground truth (optional) |
braintrust.metadata | JSON-stringified object | Free-form context (dataset name, case count, etc.) |
braintrust.tags | Array of strings | Run-level tags (optional) |
A minimal score span (Python):
import jsonwith tracer.start_as_current_span("Levenshtein", kind=SpanKind.INTERNAL) as span: span.set_attribute("braintrust.span_attributes", json.dumps({"name": "Levenshtein", "type": "score"})) span.set_attribute("braintrust.scores", json.dumps({"levenshtein": 0.92})) span.set_attribute("braintrust.input", "Foo") span.set_attribute("braintrust.output", "Hi Foo") span.set_attribute("braintrust.expected", "Hi Foo")In Last9 this appears as a regular OTel span with the score attributes attached — queryable, filterable, alertable. In Braintrust it appears as a Braintrust score span attached to the parent eval, with the score value rendered in the eval results UI.
Verification
Run any of the four examples with valid credentials. Each emits an eval root span containing two gen_ai.chat LLM call spans and two Levenshtein score spans — five spans total per trace, all sharing the same trace ID.
In Last9
- Open Traces Explorer.
- Filter by
service.name = braintrust-direct-example(orbraintrust-collector-example). - Open the latest trace. The span tree should look like:
say-hi-eval-<timestamp> (eval root)├── gen_ai.chat (LLM call 1)├── Levenshtein (score 1)├── gen_ai.chat (LLM call 2)└── Levenshtein (score 2)
- Open a
gen_ai.chatspan —gen_ai.usage.input_tokens,gen_ai.usage.output_tokens,gen_ai.response.model, and thegen_ai.content.prompt/gen_ai.content.completionevents should be present. - Open a
Levenshteinspan —braintrust.scoresand the rest of thebraintrust.*attributes should be set.
In Braintrust
- Open the project named in
BRAINTRUST_PARENT/BRAINTRUST_PROJECT(e.g.last9-otel-example). - Under Logs, the latest run shows up at the top.
- Each
Levenshteinscore span surfaces in the eval row as a scorer with the numeric value attached.
If a trace is missing on either side, check the OpenAI call returned without raising, that force_flush was invoked before the script exited, and that the Collector logs show Sent events for both pipelines (collector mode).
Optional — enhance with the Last9 GenAI SDK (Python)
The dual-export shape above works with vanilla OpenTelemetry — no Last9-specific dependency. If you also want conversation-level grouping, agent identity tracking, automatic prompt/completion capture, and per-call USD cost on gen_ai.chat spans, layer the Last9 GenAI SDK on top.
The SDK is a third SpanProcessor that runs before Braintrust and Last9 see the spans, plus an OpenAI auto-instrumentor that emits gen_ai.chat spans for every OpenAI client call without any per-call code.
-
Add the SDK + OpenAI v2 auto-instrumentation to dependencies
pip install last9-genai opentelemetry-instrumentation-openai-v2 "wrapt<2" -
Call
install()before importing the OpenAI clientAuto-instrumentation wraps OpenAI’s HTTP client at import time. A late
install()instruments nothing.from last9_genai import install, conversation_context, agent_context, workflow_contexthandle = install() # wires TracerProvider + Last9SpanProcessor + OpenAI v2 auto-instr# Only now is it safe to import OpenAI:from braintrust.otel import BraintrustSpanProcessorfrom openai import OpenAIfrom opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporterfrom opentelemetry.sdk.trace.export import BatchSpanProcessorhandle.tracer_provider.add_span_processor(BraintrustSpanProcessor())handle.tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint=f"{LAST9_ENDPOINT}/v1/traces",headers={"Authorization": LAST9_AUTH},))) -
Wrap eval runs with
conversation_contextEvery span emitted under the block carries
gen_ai.conversation.id = <eval_run_id>anduser.id, so Last9 can group all turns + score spans of one eval run under a single filter.with conversation_context(conversation_id=eval_run_id, user_id="eval-runner"):with workflow_context(workflow_id=eval_run_id, workflow_type="llm_eval"):for case in cases:output = client.chat.completions.create(...) # auto-emits gen_ai.chat... -
Tag scorer spans with
agent_contextEach scorer gets its own agent identity. In a multi-scorer eval, this makes filtering and breakdowns by scorer trivial in Last9.
with agent_context(agent_name="Levenshtein Scorer",agent_id="scorer.levenshtein.v1",agent_description="Normalized Levenshtein similarity 0..1",agent_version="1.0",):emit_score_span("Levenshtein", {"levenshtein": 0.92}, ...)
What the SDK adds, beyond vanilla OTel
| Capability | Vanilla example | With Last9 GenAI SDK |
|---|---|---|
gen_ai.chat span on each LLM call | Manual tracer.start_as_current_span(...) per call (~20 LOC) | Auto-emitted by opentelemetry-instrumentation-openai-v2 |
| Prompt + completion capture | Manual span.add_event("gen_ai.content.prompt", ...) | Auto, gated by OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true |
| Token usage attributes | Manual from response.usage | Auto |
| Per-call USD cost | Not set | gen_ai.usage.cost_usd auto-set for 20+ models |
| Conversation grouping | Not provided | gen_ai.conversation.id via conversation_context |
| Agent identity per scorer | Not provided | gen_ai.agent.{id,name,description,version} via agent_context |
| Workflow grouping | Not provided | workflow.id, workflow.type via workflow_context |
The SDK is Python-only today. Node.js apps stay on the vanilla pattern shown above.
The runnable version of this enhanced setup is at python/braintrust-direct-l9genai in the examples repo.
Where to go next
- See the full runnable examples in last9/opentelemetry-examples — the Python and Node sources include the eval-loop scaffolding, error-recording on LLM failures, and a Levenshtein scorer.
- For more on the Last9 GenAI SDK (multi-turn conversation tracing, agent identity, automatic cost tracking), see the Python GenAI SDK integration.
- The OTel GenAI semantic conventions used here are documented at opentelemetry.io.
Troubleshooting
Please get in touch with us on Discord or Email if you have any questions.