Braintrust

Send the same OpenTelemetry traces to both Braintrust (for LLM eval scores) and Last9 (for full trace and APM observability), with matching trace IDs across both backends.

What is Braintrust?

Braintrust is an LLM evaluation platform — datasets, scorers, prompt experiments, and a UI for inspecting LLM call traces. It accepts OTel traces over OTLP/HTTP and recognizes a braintrust.* attribute namespace for evaluation-specific data (scorers, ground truth, metadata).

Dual-export buys you a single trace tree with two readers: Braintrust for eval-quality investigations (which scorer fired, what was the expected output, what was the input dataset row), Last9 for end-to-end app observability (latency, errors, infra correlation, alerting). Identical trace IDs let you jump between the two views on the same request.

There are two architecturally distinct ways to ship the same span twice:

Direct mode — two SpanProcessor instances on a single TracerProvider, no Collector required.
Collector mode — the app emits OTLP/HTTP to a local OpenTelemetry Collector, which fans out to both backends via two named otlp_http exporters.

Both patterns are demonstrated below in Python and Node.

Prerequisites

Last9 Account — Sign up at app.last9.io and grab your OTLP endpoint and auth header from Integrations → OpenTelemetry.
Braintrust account — Get an API key from braintrust.dev/app/settings/api-keys and create a project (the example uses last9-otel-example).
OpenAI API key — Used by the demo workload in the runnable examples.
Python 3.10+ or Node.js 18+.

The complete runnable examples live in the last9/opentelemetry-examples repository:

Direct mode — two SpanProcessors, no Collector

In this mode the app owns the OTel TracerProvider and attaches two SpanProcessor instances:

BraintrustSpanProcessor (from braintrust[otel] / @braintrust/otel) routes spans to Braintrust and stamps the required x-bt-parent routing header.
BatchSpanProcessor(OTLPSpanExporter) ships the same spans to Last9 over OTLP/HTTP.

Both backends receive identical trace IDs because both processors observe the same span objects.

Install dependencies
- Python
- Node.js
pip install "braintrust[otel]" openai \ opentelemetry-api opentelemetry-sdk \ opentelemetry-exporter-otlp-proto-http
npm install @braintrust/otel openai \ @opentelemetry/api @opentelemetry/sdk-trace-node \ @opentelemetry/sdk-trace-base @opentelemetry/resources \ @opentelemetry/exporter-trace-otlp-http \ @opentelemetry/semantic-conventions dotenv

Set environment variables

# Service identity
export OTEL_SERVICE_NAME=braintrust-direct-example
export DEPLOYMENT_ENV=local

# Last9 — base URL only, no /v1/traces path
export LAST9_OTLP_ENDPOINT=<your-last9-otlp-endpoint>
export LAST9_OTLP_AUTH="Basic <your-last9-credentials>"

# Braintrust
export BRAINTRUST_API_KEY=<your-braintrust-api-key>
export BRAINTRUST_PARENT="project_name:last9-otel-example"

# Workload
export OPENAI_API_KEY=<your-openai-api-key>

Initialize the TracerProvider with two processors

Python
Node.js

import os
from braintrust.otel import BraintrustSpanProcessor
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

resource = Resource.create({
    SERVICE_NAME: os.environ["OTEL_SERVICE_NAME"],
    "deployment.environment": os.environ.get("DEPLOYMENT_ENV", "local"),
})

provider = TracerProvider(resource=resource)

# Braintrust: reads BRAINTRUST_API_KEY, BRAINTRUST_PARENT, BRAINTRUST_API_URL.
provider.add_span_processor(BraintrustSpanProcessor())

# Last9: explicit endpoint + headers.
last9_endpoint = os.environ["LAST9_OTLP_ENDPOINT"].rstrip("/")
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(
    endpoint=f"{last9_endpoint}/v1/traces",
    headers={"Authorization": os.environ["LAST9_OTLP_AUTH"]},
)))

trace.set_tracer_provider(provider)

require("dotenv").config();
const { NodeTracerProvider } = require("@opentelemetry/sdk-trace-node");
const { BatchSpanProcessor } = require("@opentelemetry/sdk-trace-base");
const {
  OTLPTraceExporter,
} = require("@opentelemetry/exporter-trace-otlp-http");
const { resourceFromAttributes } = require("@opentelemetry/resources");
const {
  ATTR_SERVICE_NAME,
} = require("@opentelemetry/semantic-conventions");
const { BraintrustSpanProcessor } = require("@braintrust/otel");

const provider = new NodeTracerProvider({
  resource: resourceFromAttributes({
    [ATTR_SERVICE_NAME]: process.env.OTEL_SERVICE_NAME,
    "deployment.environment": process.env.DEPLOYMENT_ENV || "local",
  }),
  spanProcessors: [
    new BraintrustSpanProcessor(),
    new BatchSpanProcessor(
      new OTLPTraceExporter({
        url: `${process.env.LAST9_OTLP_ENDPOINT.replace(/\/$/, "")}/v1/traces`,
        headers: { Authorization: process.env.LAST9_OTLP_AUTH },
      }),
    ),
  ],
});
provider.register();

Flush before exit

BatchSpanProcessor buffers spans for up to five seconds. A short-lived script that exits before the flush timer drops every span silently. Call force_flush and shutdown in your script’s exit path:

Python
Node.js

provider.force_flush()
provider.shutdown()

await provider
  .forceFlush()
  .catch((err) => console.error("Flush failed:", err));
await provider
  .shutdown()
  .catch((err) => console.error("Shutdown failed:", err));

Collector mode — fan-out at the OpenTelemetry Collector

In this mode the app emits OTLP/HTTP to a single local Collector. The Collector’s trace pipeline declares two otlp_http exporters and routes every span to both.

This pattern keeps app code vendor-agnostic — adding or removing backends only requires a Collector restart, not an app redeploy. Centralized policy (redaction, filtering, attribute renaming) lives in one YAML.

Configure environment variables for the Collector

# App → local collector
OTEL_SERVICE_NAME=braintrust-collector-example
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
DEPLOYMENT_ENV=local

# Last9 — used by the Collector exporter
LAST9_OTLP_ENDPOINT=<your-last9-otlp-endpoint>
LAST9_OTLP_AUTH="Basic <your-last9-credentials>"

# Braintrust — used by the Collector exporter
BRAINTRUST_API_KEY=<your-braintrust-api-key>
BRAINTRUST_PROJECT=last9-otel-example

# Workload
OPENAI_API_KEY=<your-openai-api-key>

Write the Collector config

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
    timeout: 5s
    send_batch_size: 512

exporters:
  otlp_http/braintrust:
    endpoint: https://api.braintrust.dev/otel
    headers:
      Authorization: "Bearer ${env:BRAINTRUST_API_KEY}"
      x-bt-parent: "project_name:${env:BRAINTRUST_PROJECT}"

  otlp_http/last9:
    endpoint: "${env:LAST9_OTLP_ENDPOINT}"
    headers:
      Authorization: "${env:LAST9_OTLP_AUTH}"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp_http/braintrust, otlp_http/last9]

Run the Collector

# docker-compose.yaml
services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.144.0
    command: ["--config", "/etc/otelcol-contrib/config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml
    ports:
      - "4317:4317"
      - "4318:4318"
    env_file:
      - .env

Start it: docker compose up otel-collector.

Point the app at the Collector

The app uses a single OTLPSpanExporter (Python) or OTLPTraceExporter (Node) with no constructor arguments — it reads OTEL_EXPORTER_OTLP_ENDPOINT from the environment.
- Python
- Node.js
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
spanProcessors: [ new BatchSpanProcessor(new OTLPTraceExporter()), ],

Span attributes

LLM call spans

The example workload sets the OTel GenAI semantic-convention attributes Last9 and Braintrust both understand:

Attribute	Source
`gen_ai.system`	LLM provider (e.g. `openai`)
`gen_ai.request.model`	Model requested (e.g. `gpt-4o-mini`)
`gen_ai.operation.name`	Operation type (`chat`, `embedding`, etc.)
`gen_ai.usage.input_tokens`	Prompt tokens billed
`gen_ai.usage.output_tokens`	Completion tokens billed
`gen_ai.response.id`	Provider-assigned response ID
`gen_ai.response.model`	Model that actually served (can differ from requested during version rollouts)
`gen_ai.response.finish_reasons`	Array of finish reasons

Prompts and completions are recorded as span events with the prompt/completion JSON payload as an attribute on the event:

Span event	Attribute
`gen_ai.content.prompt`	`gen_ai.prompt` (JSON-stringified messages)
`gen_ai.content.completion`	`gen_ai.completion` (JSON-stringified message)

Eval and score spans (`braintrust.*`)

Braintrust models eval runs and scorer outputs as dedicated span types. When you use the OTLP path (no Braintrust SDK on the data plane), the braintrust.span_attributes discriminator turns a regular OTel span into an eval, task, llm, tool, function, or score span on ingest:

Attribute	Type	Purpose
`braintrust.span_attributes`	JSON-stringified object `{"name":..., "type":"score"\|"eval"\|"task"\|"llm"\|"tool"\|"function"}`	Span-type discriminator
`braintrust.scores`	JSON-stringified `{"scorer_name": 0.0–1.0}`	Scorer outputs (one or more)
`braintrust.input`	Raw string or JSON	The input the scorer evaluated
`braintrust.output`	Raw string or JSON	The actual output
`braintrust.expected`	Raw string or JSON	Ground truth (optional)
`braintrust.metadata`	JSON-stringified object	Free-form context (dataset name, case count, etc.)
`braintrust.tags`	Array of strings	Run-level tags (optional)

A minimal score span (Python):

import json
with tracer.start_as_current_span("Levenshtein", kind=SpanKind.INTERNAL) as span:
    span.set_attribute("braintrust.span_attributes",
                       json.dumps({"name": "Levenshtein", "type": "score"}))
    span.set_attribute("braintrust.scores",
                       json.dumps({"levenshtein": 0.92}))
    span.set_attribute("braintrust.input", "Foo")
    span.set_attribute("braintrust.output", "Hi Foo")
    span.set_attribute("braintrust.expected", "Hi Foo")

In Last9 this appears as a regular OTel span with the score attributes attached — queryable, filterable, alertable. In Braintrust it appears as a Braintrust score span attached to the parent eval, with the score value rendered in the eval results UI.

Verification

Run any of the four examples with valid credentials. Each emits an eval root span containing two gen_ai.chat LLM call spans and two Levenshtein score spans — five spans total per trace, all sharing the same trace ID.

In Last9

Open Traces Explorer.
Filter by service.name = braintrust-direct-example (or braintrust-collector-example).

Open the latest trace. The span tree should look like:

say-hi-eval-<timestamp>          (eval root)
├── gen_ai.chat                  (LLM call 1)
├── Levenshtein                  (score 1)
├── gen_ai.chat                  (LLM call 2)
└── Levenshtein                  (score 2)

Open a gen_ai.chat span — gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.response.model, and the gen_ai.content.prompt / gen_ai.content.completion events should be present.
Open a Levenshtein span — braintrust.scores and the rest of the braintrust.* attributes should be set.

In Braintrust

Open the project named in BRAINTRUST_PARENT / BRAINTRUST_PROJECT (e.g. last9-otel-example).
Under Logs, the latest run shows up at the top.
Each Levenshtein score span surfaces in the eval row as a scorer with the numeric value attached.

If a trace is missing on either side, check the OpenAI call returned without raising, that force_flush was invoked before the script exited, and that the Collector logs show Sent events for both pipelines (collector mode).

Optional — enhance with the Last9 GenAI SDK (Python)

The dual-export shape above works with vanilla OpenTelemetry — no Last9-specific dependency. If you also want conversation-level grouping, agent identity tracking, automatic prompt/completion capture, and per-call USD cost on gen_ai.chat spans, layer the Last9 GenAI SDK on top.

The SDK is a third SpanProcessor that runs before Braintrust and Last9 see the spans, plus an OpenAI auto-instrumentor that emits gen_ai.chat spans for every OpenAI client call without any per-call code.

Add the SDK + OpenAI v2 auto-instrumentation to dependencies
```
pip install last9-genai opentelemetry-instrumentation-openai-v2 "wrapt<2"
```
Python 3.14 users: pin wrapt<2. wrapt 2.0 renamed an internal kwarg that opentelemetry-instrumentation-openai-v2 relies on, causing silent instrumentation failure.

Call install() before importing the OpenAI client

Auto-instrumentation wraps OpenAI’s HTTP client at import time. A late install() instruments nothing.

from last9_genai import install, conversation_context, agent_context, workflow_context

handle = install()  # wires TracerProvider + Last9SpanProcessor + OpenAI v2 auto-instr

# Only now is it safe to import OpenAI:
from braintrust.otel import BraintrustSpanProcessor
from openai import OpenAI
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor

handle.tracer_provider.add_span_processor(BraintrustSpanProcessor())
handle.tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(
    endpoint=f"{LAST9_ENDPOINT}/v1/traces",
    headers={"Authorization": LAST9_AUTH},
)))

Wrap eval runs with conversation_context

Every span emitted under the block carries gen_ai.conversation.id = <eval_run_id> and user.id, so Last9 can group all turns + score spans of one eval run under a single filter.

with conversation_context(conversation_id=eval_run_id, user_id="eval-runner"):
    with workflow_context(workflow_id=eval_run_id, workflow_type="llm_eval"):
        for case in cases:
            output = client.chat.completions.create(...)  # auto-emits gen_ai.chat
            ...

Tag scorer spans with agent_context

Each scorer gets its own agent identity. In a multi-scorer eval, this makes filtering and breakdowns by scorer trivial in Last9.

with agent_context(
    agent_name="Levenshtein Scorer",
    agent_id="scorer.levenshtein.v1",
    agent_description="Normalized Levenshtein similarity 0..1",
    agent_version="1.0",
):
    emit_score_span("Levenshtein", {"levenshtein": 0.92}, ...)

What the SDK adds, beyond vanilla OTel

Capability	Vanilla example	With Last9 GenAI SDK
`gen_ai.chat` span on each LLM call	Manual `tracer.start_as_current_span(...)` per call (~20 LOC)	Auto-emitted by `opentelemetry-instrumentation-openai-v2`
Prompt + completion capture	Manual `span.add_event("gen_ai.content.prompt", ...)`	Auto, gated by `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true`
Token usage attributes	Manual from `response.usage`	Auto
Per-call USD cost	Not set	`gen_ai.usage.cost_usd` auto-set for 20+ models
Conversation grouping	Not provided	`gen_ai.conversation.id` via `conversation_context`
Agent identity per scorer	Not provided	`gen_ai.agent.{id,name,description,version}` via `agent_context`
Workflow grouping	Not provided	`workflow.id`, `workflow.type` via `workflow_context`

The SDK is Python-only today. Node.js apps stay on the vanilla pattern shown above.

The runnable version of this enhanced setup is at python/braintrust-direct-l9genai in the examples repo.

Where to go next

See the full runnable examples in last9/opentelemetry-examples — the Python and Node sources include the eval-loop scaffolding, error-recording on LLM failures, and a Levenshtein scorer.
For more on the Last9 GenAI SDK (multi-turn conversation tracing, agent identity, automatic cost tracking), see the Python GenAI SDK integration.
The OTel GenAI semantic conventions used here are documented at opentelemetry.io.

Troubleshooting

Please get in touch with us on Discord or Email if you have any questions.