Skip to content
Last9
Book demo

Braintrust

Send the same OpenTelemetry traces to both Braintrust (LLM eval scores) and Last9 (full trace and APM observability) from Python and Node applications

Send the same OpenTelemetry traces to both Braintrust (for LLM eval scores) and Last9 (for full trace and APM observability), with matching trace IDs across both backends.

What is Braintrust?

Braintrust is an LLM evaluation platform — datasets, scorers, prompt experiments, and a UI for inspecting LLM call traces. It accepts OTel traces over OTLP/HTTP and recognizes a braintrust.* attribute namespace for evaluation-specific data (scorers, ground truth, metadata).

Dual-export buys you a single trace tree with two readers: Braintrust for eval-quality investigations (which scorer fired, what was the expected output, what was the input dataset row), Last9 for end-to-end app observability (latency, errors, infra correlation, alerting). Identical trace IDs let you jump between the two views on the same request.

There are two architecturally distinct ways to ship the same span twice:

  • Direct mode — two SpanProcessor instances on a single TracerProvider, no Collector required.
  • Collector mode — the app emits OTLP/HTTP to a local OpenTelemetry Collector, which fans out to both backends via two named otlp_http exporters.

Both patterns are demonstrated below in Python and Node.

Prerequisites

  1. Last9 Account — Sign up at app.last9.io and grab your OTLP endpoint and auth header from Integrations → OpenTelemetry.
  2. Braintrust account — Get an API key from braintrust.dev/app/settings/api-keys and create a project (the example uses last9-otel-example).
  3. OpenAI API key — Used by the demo workload in the runnable examples.
  4. Python 3.10+ or Node.js 18+.

The complete runnable examples live in the last9/opentelemetry-examples repository:

Direct mode — two SpanProcessors, no Collector

In this mode the app owns the OTel TracerProvider and attaches two SpanProcessor instances:

  • BraintrustSpanProcessor (from braintrust[otel] / @braintrust/otel) routes spans to Braintrust and stamps the required x-bt-parent routing header.
  • BatchSpanProcessor(OTLPSpanExporter) ships the same spans to Last9 over OTLP/HTTP.

Both backends receive identical trace IDs because both processors observe the same span objects.

  1. Install dependencies

    pip install "braintrust[otel]" openai \
    opentelemetry-api opentelemetry-sdk \
    opentelemetry-exporter-otlp-proto-http
  2. Set environment variables

    # Service identity
    export OTEL_SERVICE_NAME=braintrust-direct-example
    export DEPLOYMENT_ENV=local
    # Last9 — base URL only, no /v1/traces path
    export LAST9_OTLP_ENDPOINT=<your-last9-otlp-endpoint>
    export LAST9_OTLP_AUTH="Basic <your-last9-credentials>"
    # Braintrust
    export BRAINTRUST_API_KEY=<your-braintrust-api-key>
    export BRAINTRUST_PARENT="project_name:last9-otel-example"
    # Workload
    export OPENAI_API_KEY=<your-openai-api-key>
  3. Initialize the TracerProvider with two processors

    import os
    from braintrust.otel import BraintrustSpanProcessor
    from opentelemetry import trace
    from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
    from opentelemetry.sdk.resources import SERVICE_NAME, Resource
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry.sdk.trace.export import BatchSpanProcessor
    resource = Resource.create({
    SERVICE_NAME: os.environ["OTEL_SERVICE_NAME"],
    "deployment.environment": os.environ.get("DEPLOYMENT_ENV", "local"),
    })
    provider = TracerProvider(resource=resource)
    # Braintrust: reads BRAINTRUST_API_KEY, BRAINTRUST_PARENT, BRAINTRUST_API_URL.
    provider.add_span_processor(BraintrustSpanProcessor())
    # Last9: explicit endpoint + headers.
    last9_endpoint = os.environ["LAST9_OTLP_ENDPOINT"].rstrip("/")
    provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(
    endpoint=f"{last9_endpoint}/v1/traces",
    headers={"Authorization": os.environ["LAST9_OTLP_AUTH"]},
    )))
    trace.set_tracer_provider(provider)
  4. Flush before exit

    BatchSpanProcessor buffers spans for up to five seconds. A short-lived script that exits before the flush timer drops every span silently. Call force_flush and shutdown in your script’s exit path:

    provider.force_flush()
    provider.shutdown()

Collector mode — fan-out at the OpenTelemetry Collector

In this mode the app emits OTLP/HTTP to a single local Collector. The Collector’s trace pipeline declares two otlp_http exporters and routes every span to both.

This pattern keeps app code vendor-agnostic — adding or removing backends only requires a Collector restart, not an app redeploy. Centralized policy (redaction, filtering, attribute renaming) lives in one YAML.

  1. Configure environment variables for the Collector

    # App → local collector
    OTEL_SERVICE_NAME=braintrust-collector-example
    OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
    OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
    DEPLOYMENT_ENV=local
    # Last9 — used by the Collector exporter
    LAST9_OTLP_ENDPOINT=<your-last9-otlp-endpoint>
    LAST9_OTLP_AUTH="Basic <your-last9-credentials>"
    # Braintrust — used by the Collector exporter
    BRAINTRUST_API_KEY=<your-braintrust-api-key>
    BRAINTRUST_PROJECT=last9-otel-example
    # Workload
    OPENAI_API_KEY=<your-openai-api-key>
  2. Write the Collector config

    # otel-collector-config.yaml
    receivers:
    otlp:
    protocols:
    http:
    endpoint: 0.0.0.0:4318
    grpc:
    endpoint: 0.0.0.0:4317
    processors:
    batch:
    timeout: 5s
    send_batch_size: 512
    exporters:
    otlp_http/braintrust:
    endpoint: https://api.braintrust.dev/otel
    headers:
    Authorization: "Bearer ${env:BRAINTRUST_API_KEY}"
    x-bt-parent: "project_name:${env:BRAINTRUST_PROJECT}"
    otlp_http/last9:
    endpoint: "${env:LAST9_OTLP_ENDPOINT}"
    headers:
    Authorization: "${env:LAST9_OTLP_AUTH}"
    service:
    pipelines:
    traces:
    receivers: [otlp]
    processors: [batch]
    exporters: [otlp_http/braintrust, otlp_http/last9]
  3. Run the Collector

    # docker-compose.yaml
    services:
    otel-collector:
    image: otel/opentelemetry-collector-contrib:0.144.0
    command: ["--config", "/etc/otelcol-contrib/config.yaml"]
    volumes:
    - ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml
    ports:
    - "4317:4317"
    - "4318:4318"
    env_file:
    - .env

    Start it: docker compose up otel-collector.

  4. Point the app at the Collector

    The app uses a single OTLPSpanExporter (Python) or OTLPTraceExporter (Node) with no constructor arguments — it reads OTEL_EXPORTER_OTLP_ENDPOINT from the environment.

    provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))

Span attributes

LLM call spans

The example workload sets the OTel GenAI semantic-convention attributes Last9 and Braintrust both understand:

AttributeSource
gen_ai.systemLLM provider (e.g. openai)
gen_ai.request.modelModel requested (e.g. gpt-4o-mini)
gen_ai.operation.nameOperation type (chat, embedding, etc.)
gen_ai.usage.input_tokensPrompt tokens billed
gen_ai.usage.output_tokensCompletion tokens billed
gen_ai.response.idProvider-assigned response ID
gen_ai.response.modelModel that actually served (can differ from requested during version rollouts)
gen_ai.response.finish_reasonsArray of finish reasons

Prompts and completions are recorded as span events with the prompt/completion JSON payload as an attribute on the event:

Span eventAttribute
gen_ai.content.promptgen_ai.prompt (JSON-stringified messages)
gen_ai.content.completiongen_ai.completion (JSON-stringified message)

Eval and score spans (braintrust.*)

Braintrust models eval runs and scorer outputs as dedicated span types. When you use the OTLP path (no Braintrust SDK on the data plane), the braintrust.span_attributes discriminator turns a regular OTel span into an eval, task, llm, tool, function, or score span on ingest:

AttributeTypePurpose
braintrust.span_attributesJSON-stringified object {"name":..., "type":"score"|"eval"|"task"|"llm"|"tool"|"function"}Span-type discriminator
braintrust.scoresJSON-stringified {"scorer_name": 0.0–1.0}Scorer outputs (one or more)
braintrust.inputRaw string or JSONThe input the scorer evaluated
braintrust.outputRaw string or JSONThe actual output
braintrust.expectedRaw string or JSONGround truth (optional)
braintrust.metadataJSON-stringified objectFree-form context (dataset name, case count, etc.)
braintrust.tagsArray of stringsRun-level tags (optional)

A minimal score span (Python):

import json
with tracer.start_as_current_span("Levenshtein", kind=SpanKind.INTERNAL) as span:
span.set_attribute("braintrust.span_attributes",
json.dumps({"name": "Levenshtein", "type": "score"}))
span.set_attribute("braintrust.scores",
json.dumps({"levenshtein": 0.92}))
span.set_attribute("braintrust.input", "Foo")
span.set_attribute("braintrust.output", "Hi Foo")
span.set_attribute("braintrust.expected", "Hi Foo")

In Last9 this appears as a regular OTel span with the score attributes attached — queryable, filterable, alertable. In Braintrust it appears as a Braintrust score span attached to the parent eval, with the score value rendered in the eval results UI.

Verification

Run any of the four examples with valid credentials. Each emits an eval root span containing two gen_ai.chat LLM call spans and two Levenshtein score spans — five spans total per trace, all sharing the same trace ID.

In Last9

  1. Open Traces Explorer.
  2. Filter by service.name = braintrust-direct-example (or braintrust-collector-example).
  3. Open the latest trace. The span tree should look like:
    say-hi-eval-<timestamp> (eval root)
    ├── gen_ai.chat (LLM call 1)
    ├── Levenshtein (score 1)
    ├── gen_ai.chat (LLM call 2)
    └── Levenshtein (score 2)
  4. Open a gen_ai.chat span — gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.response.model, and the gen_ai.content.prompt / gen_ai.content.completion events should be present.
  5. Open a Levenshtein span — braintrust.scores and the rest of the braintrust.* attributes should be set.

In Braintrust

  1. Open the project named in BRAINTRUST_PARENT / BRAINTRUST_PROJECT (e.g. last9-otel-example).
  2. Under Logs, the latest run shows up at the top.
  3. Each Levenshtein score span surfaces in the eval row as a scorer with the numeric value attached.

If a trace is missing on either side, check the OpenAI call returned without raising, that force_flush was invoked before the script exited, and that the Collector logs show Sent events for both pipelines (collector mode).

Optional — enhance with the Last9 GenAI SDK (Python)

The dual-export shape above works with vanilla OpenTelemetry — no Last9-specific dependency. If you also want conversation-level grouping, agent identity tracking, automatic prompt/completion capture, and per-call USD cost on gen_ai.chat spans, layer the Last9 GenAI SDK on top.

The SDK is a third SpanProcessor that runs before Braintrust and Last9 see the spans, plus an OpenAI auto-instrumentor that emits gen_ai.chat spans for every OpenAI client call without any per-call code.

  1. Add the SDK + OpenAI v2 auto-instrumentation to dependencies

    pip install last9-genai opentelemetry-instrumentation-openai-v2 "wrapt<2"
  2. Call install() before importing the OpenAI client

    Auto-instrumentation wraps OpenAI’s HTTP client at import time. A late install() instruments nothing.

    from last9_genai import install, conversation_context, agent_context, workflow_context
    handle = install() # wires TracerProvider + Last9SpanProcessor + OpenAI v2 auto-instr
    # Only now is it safe to import OpenAI:
    from braintrust.otel import BraintrustSpanProcessor
    from openai import OpenAI
    from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
    from opentelemetry.sdk.trace.export import BatchSpanProcessor
    handle.tracer_provider.add_span_processor(BraintrustSpanProcessor())
    handle.tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(
    endpoint=f"{LAST9_ENDPOINT}/v1/traces",
    headers={"Authorization": LAST9_AUTH},
    )))
  3. Wrap eval runs with conversation_context

    Every span emitted under the block carries gen_ai.conversation.id = <eval_run_id> and user.id, so Last9 can group all turns + score spans of one eval run under a single filter.

    with conversation_context(conversation_id=eval_run_id, user_id="eval-runner"):
    with workflow_context(workflow_id=eval_run_id, workflow_type="llm_eval"):
    for case in cases:
    output = client.chat.completions.create(...) # auto-emits gen_ai.chat
    ...
  4. Tag scorer spans with agent_context

    Each scorer gets its own agent identity. In a multi-scorer eval, this makes filtering and breakdowns by scorer trivial in Last9.

    with agent_context(
    agent_name="Levenshtein Scorer",
    agent_id="scorer.levenshtein.v1",
    agent_description="Normalized Levenshtein similarity 0..1",
    agent_version="1.0",
    ):
    emit_score_span("Levenshtein", {"levenshtein": 0.92}, ...)

What the SDK adds, beyond vanilla OTel

CapabilityVanilla exampleWith Last9 GenAI SDK
gen_ai.chat span on each LLM callManual tracer.start_as_current_span(...) per call (~20 LOC)Auto-emitted by opentelemetry-instrumentation-openai-v2
Prompt + completion captureManual span.add_event("gen_ai.content.prompt", ...)Auto, gated by OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true
Token usage attributesManual from response.usageAuto
Per-call USD costNot setgen_ai.usage.cost_usd auto-set for 20+ models
Conversation groupingNot providedgen_ai.conversation.id via conversation_context
Agent identity per scorerNot providedgen_ai.agent.{id,name,description,version} via agent_context
Workflow groupingNot providedworkflow.id, workflow.type via workflow_context

The SDK is Python-only today. Node.js apps stay on the vanilla pattern shown above.

The runnable version of this enhanced setup is at python/braintrust-direct-l9genai in the examples repo.

Where to go next

  • See the full runnable examples in last9/opentelemetry-examples — the Python and Node sources include the eval-loop scaffolding, error-recording on LLM failures, and a Levenshtein scorer.
  • For more on the Last9 GenAI SDK (multi-turn conversation tracing, agent identity, automatic cost tracking), see the Python GenAI SDK integration.
  • The OTel GenAI semantic conventions used here are documented at opentelemetry.io.

Troubleshooting

Please get in touch with us on Discord or Email if you have any questions.