Skip to content
Last9
Book demo

Python GenAI SDK

Track LLM conversations, tool calls, and token usage from Python AI applications using the Last9 GenAI SDK

Track multi-turn LLM conversations, tool executions, and token usage from Python AI applications. The Last9 GenAI SDK extends OpenTelemetry with conversation grouping, workflow tracking, and prompt/completion capture — so you can trace an entire user session from first message to final response.

What is the Last9 GenAI SDK?

The Last9 GenAI SDK is an OpenTelemetry span processor that enriches traces with AI-specific context. It works alongside your existing OTel setup — no separate tracing pipeline needed.

Key capabilities:

  • Conversation tracking — Group multi-turn interactions under a single conversation_id (e.g., a Slack thread or chat session)
  • Workflow tracking — Group multi-step operations like RAG pipelines or tool-use loops
  • Provider-agnostic — Works with OpenAI, Anthropic, Google, Cohere, or any LLM provider
  • Thread-safe — Uses Python contextvars for safe concurrent execution

Prerequisites

  1. Last9 Account — Sign up at app.last9.io
  2. Python 3.10+ with an existing LLM application
  3. OTel credentials — Get your endpoint and auth header from Integrations → OpenTelemetry

Integration Setup

  1. Install the SDK

    pip install last9-genai[otlp]

    This installs the Last9 GenAI SDK along with the OpenTelemetry OTLP exporter.

  2. Set environment variables

    export OTEL_SERVICE_NAME=<your_service_name>
    export OTEL_EXPORTER_OTLP_ENDPOINT=<your_last9_otlp_endpoint>
    export OTEL_EXPORTER_OTLP_HEADERS="Authorization=<your_auth_header>"

    Find these values in your Last9 dashboard under Integrations → OpenTelemetry.

  3. Initialize the tracer

    Add this to your application startup:

    import os
    from opentelemetry import trace
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry.sdk.trace.export import BatchSpanProcessor
    from opentelemetry.sdk.resources import SERVICE_NAME, Resource
    from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
    from last9_genai import Last9SpanProcessor
    resource = Resource.create({SERVICE_NAME: os.environ["OTEL_SERVICE_NAME"]})
    provider = TracerProvider(resource=resource)
    # Last9 span processor — enriches spans with conversation/workflow context
    provider.add_span_processor(Last9SpanProcessor())
    # OTLP exporter — sends traces to Last9
    endpoint = os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"]
    headers = dict(
    item.split("=", 1)
    for item in os.environ["OTEL_EXPORTER_OTLP_HEADERS"].split(",")
    )
    provider.add_span_processor(BatchSpanProcessor(
    OTLPSpanExporter(endpoint=f"{endpoint}/v1/traces", headers=headers)
    ))
    trace.set_tracer_provider(provider)
  4. Wrap LLM calls with conversation context

    from openai import OpenAI
    from last9_genai import conversation_context
    client = OpenAI()
    with conversation_context(conversation_id="session_123", user_id="user_456"):
    response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    )

    All spans inside the conversation_context block are tagged with gen_ai.conversation.id and user.id.

Multi-Turn Conversation Example

Each turn in a conversation uses the same conversation_id. In Last9, you can filter by this ID to see the full conversation timeline:

from last9_genai import conversation_context
THREAD_ID = "slack-thread-abc123"
# Turn 1
with conversation_context(conversation_id=THREAD_ID, user_id="user_1"):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What pods are failing?"}],
)
# Turn 2 — same conversation_id links the turns
with conversation_context(conversation_id=THREAD_ID, user_id="user_1"):
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "What pods are failing?"},
{"role": "assistant", "content": "api-gateway is in CrashLoopBackOff."},
{"role": "user", "content": "Check the logs for that pod"},
],
)

Workflow Tracking

Group multi-step operations — RAG pipelines, tool-use loops, agent chains — as a named workflow:

from last9_genai import conversation_context, workflow_context
with conversation_context(conversation_id="session_123", user_id="user_1"):
# Workflows nest inside conversations
with workflow_context(workflow_id="rag_pipeline_001", workflow_type="retrieval"):
docs = retrieve_documents(query)
context = rerank_documents(docs)
response = generate_answer(context)

Workflow spans carry workflow.id and workflow.type attributes, making it easy to filter and compare pipeline performance in Last9.

Recording Prompts and Completions

Capture LLM inputs and outputs as span events for debugging failed or slow responses:

import json
from opentelemetry.trace import SpanKind
tracer = trace.get_tracer("my-app")
with tracer.start_as_current_span("gen_ai.chat", kind=SpanKind.CLIENT) as span:
span.set_attribute("gen_ai.system", "openai")
span.set_attribute("gen_ai.request.model", "gpt-4o")
# Record the prompt
span.add_event("gen_ai.content.prompt", attributes={
"gen_ai.prompt": json.dumps(messages),
})
response = client.chat.completions.create(model="gpt-4o", messages=messages)
# Record the completion
span.add_event("gen_ai.content.completion", attributes={
"gen_ai.completion": response.choices[0].message.content,
})
# Record token usage
span.set_attribute("gen_ai.usage.input_tokens", response.usage.prompt_tokens)
span.set_attribute("gen_ai.usage.output_tokens", response.usage.completion_tokens)

Viewing Traces in Last9

After sending LLM requests, navigate to LLM Monitoring in your Last9 dashboard. The Conversations tab shows all tracked conversations with cost, token usage, and duration:

LLM Monitoring — Conversations list showing active conversations, total cost, and token usage

Click on a conversation to see the full Conversation Flow — each interaction shows the prompt, response, token counts, cost, and trace ID:

Conversation Details — multi-turn flow with prompts, responses, token usage, and trace links

From the conversation detail view, you can:

  1. See all interactions in a conversation grouped by gen_ai.conversation.id
  2. View full prompts and responses for each LLM call
  3. Track token usage and cost per interaction
  4. Click View Details to jump to the full trace with span-level timing

Use Cases

  • Conversation Debugging — Trace a user’s full session across multiple turns to find where responses degraded or tools failed
  • Latency Analysis — Compare LLM call latencies across models, prompt sizes, and tool-use patterns
  • Token Cost Tracking — Monitor input/output token counts per conversation to identify expensive interactions
  • Agent Observability — Track tool-use loops in AI agents: which tools were called, whether they were approved, and how they affected the final response

Troubleshooting

  • Verify the Auth Header includes the Basic prefix
  • Confirm the OTLP endpoint URL is correct (the SDK appends /v1/traces)
  • Check that opentelemetry-sdk version is >= 1.20.0
  • Set OTEL_LOG_LEVEL=debug to see export diagnostics

Please get in touch with us on Discord or Email if you have any questions.