Python GenAI SDK

Track multi-turn LLM conversations, tool executions, and token usage from Python AI applications. The Last9 GenAI SDK extends OpenTelemetry with conversation grouping, workflow tracking, and prompt/completion capture — so you can trace an entire user session from first message to final response.

What is the Last9 GenAI SDK?

The Last9 GenAI SDK is an OpenTelemetry span processor that enriches traces with AI-specific context. It works alongside your existing OTel setup — no separate tracing pipeline needed.

Key capabilities:

Conversation tracking — Group multi-turn interactions under a single conversation_id (e.g., a Slack thread or chat session)
Workflow tracking — Group multi-step operations like RAG pipelines or tool-use loops
Provider-agnostic — Works with OpenAI, Anthropic, Google, Cohere, or any LLM provider
Thread-safe — Uses Python contextvars for safe concurrent execution

Prerequisites

Last9 Account — Sign up at app.last9.io
Python 3.10+ with an existing LLM application
OTel credentials — Get your endpoint and auth header from Integrations → OpenTelemetry

Integration Setup

Install the SDK
```
pip install last9-genai[otlp]
```
This installs the Last9 GenAI SDK along with the OpenTelemetry OTLP exporter.

Set environment variables

export OTEL_SERVICE_NAME=<your_service_name>
export OTEL_EXPORTER_OTLP_ENDPOINT=<your_last9_otlp_endpoint>
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=<your_auth_header>"

Find these values in your Last9 dashboard under Integrations → OpenTelemetry.

Initialize the tracer

Add this to your application startup:

import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from last9_genai import Last9SpanProcessor

resource = Resource.create({SERVICE_NAME: os.environ["OTEL_SERVICE_NAME"]})
provider = TracerProvider(resource=resource)

# Last9 span processor — enriches spans with conversation/workflow context
provider.add_span_processor(Last9SpanProcessor())

# OTLP exporter — sends traces to Last9
endpoint = os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"]
headers = dict(
    item.split("=", 1)
    for item in os.environ["OTEL_EXPORTER_OTLP_HEADERS"].split(",")
)
provider.add_span_processor(BatchSpanProcessor(
    OTLPSpanExporter(endpoint=f"{endpoint}/v1/traces", headers=headers)
))

trace.set_tracer_provider(provider)

Wrap LLM calls with conversation context

from openai import OpenAI
from last9_genai import conversation_context

client = OpenAI()

with conversation_context(conversation_id="session_123", user_id="user_456"):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
    )

All spans inside the conversation_context block are tagged with gen_ai.conversation.id and user.id.

Multi-Turn Conversation Example

Each turn in a conversation uses the same conversation_id. In Last9, you can filter by this ID to see the full conversation timeline:

from last9_genai import conversation_context

THREAD_ID = "slack-thread-abc123"

# Turn 1
with conversation_context(conversation_id=THREAD_ID, user_id="user_1"):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "What pods are failing?"}],
    )

# Turn 2 — same conversation_id links the turns
with conversation_context(conversation_id=THREAD_ID, user_id="user_1"):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": "What pods are failing?"},
            {"role": "assistant", "content": "api-gateway is in CrashLoopBackOff."},
            {"role": "user", "content": "Check the logs for that pod"},
        ],
    )

Workflow Tracking

Group multi-step operations — RAG pipelines, tool-use loops, agent chains — as a named workflow:

from last9_genai import conversation_context, workflow_context

with conversation_context(conversation_id="session_123", user_id="user_1"):
    # Workflows nest inside conversations
    with workflow_context(workflow_id="rag_pipeline_001", workflow_type="retrieval"):
        docs = retrieve_documents(query)
        context = rerank_documents(docs)
        response = generate_answer(context)

Workflow spans carry workflow.id and workflow.type attributes, making it easy to filter and compare pipeline performance in Last9.

Recording Prompts and Completions

Capture LLM inputs and outputs as span events for debugging failed or slow responses:

import json
from opentelemetry.trace import SpanKind

tracer = trace.get_tracer("my-app")

with tracer.start_as_current_span("gen_ai.chat", kind=SpanKind.CLIENT) as span:
    span.set_attribute("gen_ai.system", "openai")
    span.set_attribute("gen_ai.request.model", "gpt-4o")

    # Record the prompt
    span.add_event("gen_ai.content.prompt", attributes={
        "gen_ai.prompt": json.dumps(messages),
    })

    response = client.chat.completions.create(model="gpt-4o", messages=messages)

    # Record the completion
    span.add_event("gen_ai.content.completion", attributes={
        "gen_ai.completion": response.choices[0].message.content,
    })

    # Record token usage
    span.set_attribute("gen_ai.usage.input_tokens", response.usage.prompt_tokens)
    span.set_attribute("gen_ai.usage.output_tokens", response.usage.completion_tokens)

Viewing Traces in Last9

After sending LLM requests, navigate to LLM Monitoring in your Last9 dashboard. The Conversations tab shows all tracked conversations with cost, token usage, and duration:

LLM Monitoring — Conversations list showing active conversations, total cost, and token usage

Click on a conversation to see the full Conversation Flow — each interaction shows the prompt, response, token counts, cost, and trace ID:

Conversation Details — multi-turn flow with prompts, responses, token usage, and trace links

From the conversation detail view, you can:

See all interactions in a conversation grouped by gen_ai.conversation.id
View full prompts and responses for each LLM call
Track token usage and cost per interaction
Click View Details to jump to the full trace with span-level timing

Use Cases

Conversation Debugging — Trace a user’s full session across multiple turns to find where responses degraded or tools failed
Latency Analysis — Compare LLM call latencies across models, prompt sizes, and tool-use patterns
Token Cost Tracking — Monitor input/output token counts per conversation to identify expensive interactions
Agent Observability — Track tool-use loops in AI agents: which tools were called, whether they were approved, and how they affected the final response

Troubleshooting

Verify the Auth Header includes the Basic prefix
Confirm the OTLP endpoint URL is correct (the SDK appends /v1/traces)
Check that opentelemetry-sdk version is >= 1.20.0
Set OTEL_LOG_LEVEL=debug to see export diagnostics

Please get in touch with us on Discord or Email if you have any questions.