Python GenAI SDK
Track LLM conversations, tool calls, and token usage from Python AI applications using the Last9 GenAI SDK
Track multi-turn LLM conversations, tool executions, and token usage from Python AI applications. The Last9 GenAI SDK extends OpenTelemetry with conversation grouping, workflow tracking, and prompt/completion capture — so you can trace an entire user session from first message to final response.
What is the Last9 GenAI SDK?
The Last9 GenAI SDK is an OpenTelemetry span processor that enriches traces with AI-specific context. It works alongside your existing OTel setup — no separate tracing pipeline needed.
Key capabilities:
- Conversation tracking — Group multi-turn interactions under a single
conversation_id(e.g., a Slack thread or chat session) - Workflow tracking — Group multi-step operations like RAG pipelines or tool-use loops
- Provider-agnostic — Works with OpenAI, Anthropic, Google, Cohere, or any LLM provider
- Thread-safe — Uses Python
contextvarsfor safe concurrent execution
Prerequisites
- Last9 Account — Sign up at app.last9.io
- Python 3.10+ with an existing LLM application
- OTel credentials — Get your endpoint and auth header from Integrations → OpenTelemetry
Integration Setup
-
Install the SDK
pip install last9-genai[otlp]This installs the Last9 GenAI SDK along with the OpenTelemetry OTLP exporter.
-
Set environment variables
export OTEL_SERVICE_NAME=<your_service_name>export OTEL_EXPORTER_OTLP_ENDPOINT=<your_last9_otlp_endpoint>export OTEL_EXPORTER_OTLP_HEADERS="Authorization=<your_auth_header>"Find these values in your Last9 dashboard under Integrations → OpenTelemetry.
-
Initialize the tracer
Add this to your application startup:
import osfrom opentelemetry import tracefrom opentelemetry.sdk.trace import TracerProviderfrom opentelemetry.sdk.trace.export import BatchSpanProcessorfrom opentelemetry.sdk.resources import SERVICE_NAME, Resourcefrom opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporterfrom last9_genai import Last9SpanProcessorresource = Resource.create({SERVICE_NAME: os.environ["OTEL_SERVICE_NAME"]})provider = TracerProvider(resource=resource)# Last9 span processor — enriches spans with conversation/workflow contextprovider.add_span_processor(Last9SpanProcessor())# OTLP exporter — sends traces to Last9endpoint = os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"]headers = dict(item.split("=", 1)for item in os.environ["OTEL_EXPORTER_OTLP_HEADERS"].split(","))provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint=f"{endpoint}/v1/traces", headers=headers)))trace.set_tracer_provider(provider) -
Wrap LLM calls with conversation context
from openai import OpenAIfrom last9_genai import conversation_contextclient = OpenAI()with conversation_context(conversation_id="session_123", user_id="user_456"):response = client.chat.completions.create(model="gpt-4o",messages=[{"role": "user", "content": "Hello!"}],)All spans inside the
conversation_contextblock are tagged withgen_ai.conversation.idanduser.id.
Multi-Turn Conversation Example
Each turn in a conversation uses the same conversation_id. In Last9, you can filter by this ID to see the full conversation timeline:
from last9_genai import conversation_context
THREAD_ID = "slack-thread-abc123"
# Turn 1with conversation_context(conversation_id=THREAD_ID, user_id="user_1"): response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What pods are failing?"}], )
# Turn 2 — same conversation_id links the turnswith conversation_context(conversation_id=THREAD_ID, user_id="user_1"): response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "user", "content": "What pods are failing?"}, {"role": "assistant", "content": "api-gateway is in CrashLoopBackOff."}, {"role": "user", "content": "Check the logs for that pod"}, ], )Workflow Tracking
Group multi-step operations — RAG pipelines, tool-use loops, agent chains — as a named workflow:
from last9_genai import conversation_context, workflow_context
with conversation_context(conversation_id="session_123", user_id="user_1"): # Workflows nest inside conversations with workflow_context(workflow_id="rag_pipeline_001", workflow_type="retrieval"): docs = retrieve_documents(query) context = rerank_documents(docs) response = generate_answer(context)Workflow spans carry workflow.id and workflow.type attributes, making it easy to filter and compare pipeline performance in Last9.
Recording Prompts and Completions
Capture LLM inputs and outputs as span events for debugging failed or slow responses:
import jsonfrom opentelemetry.trace import SpanKind
tracer = trace.get_tracer("my-app")
with tracer.start_as_current_span("gen_ai.chat", kind=SpanKind.CLIENT) as span: span.set_attribute("gen_ai.system", "openai") span.set_attribute("gen_ai.request.model", "gpt-4o")
# Record the prompt span.add_event("gen_ai.content.prompt", attributes={ "gen_ai.prompt": json.dumps(messages), })
response = client.chat.completions.create(model="gpt-4o", messages=messages)
# Record the completion span.add_event("gen_ai.content.completion", attributes={ "gen_ai.completion": response.choices[0].message.content, })
# Record token usage span.set_attribute("gen_ai.usage.input_tokens", response.usage.prompt_tokens) span.set_attribute("gen_ai.usage.output_tokens", response.usage.completion_tokens)Viewing Traces in Last9
After sending LLM requests, navigate to LLM Monitoring in your Last9 dashboard. The Conversations tab shows all tracked conversations with cost, token usage, and duration:

Click on a conversation to see the full Conversation Flow — each interaction shows the prompt, response, token counts, cost, and trace ID:

From the conversation detail view, you can:
- See all interactions in a conversation grouped by
gen_ai.conversation.id - View full prompts and responses for each LLM call
- Track token usage and cost per interaction
- Click View Details to jump to the full trace with span-level timing
Use Cases
- Conversation Debugging — Trace a user’s full session across multiple turns to find where responses degraded or tools failed
- Latency Analysis — Compare LLM call latencies across models, prompt sizes, and tool-use patterns
- Token Cost Tracking — Monitor input/output token counts per conversation to identify expensive interactions
- Agent Observability — Track tool-use loops in AI agents: which tools were called, whether they were approved, and how they affected the final response
Troubleshooting
- Verify the Auth Header includes the
Basicprefix - Confirm the OTLP endpoint URL is correct (the SDK appends
/v1/traces) - Check that
opentelemetry-sdkversion is >= 1.20.0 - Set
OTEL_LOG_LEVEL=debugto see export diagnostics
Please get in touch with us on Discord or Email if you have any questions.