Last9 integration with TrueFoundry AI Gateway

If you're using TrueFoundry to manage your LLM traffic, you can now send those traces directly to Last9 and view them alongside your existing infrastructure telemetry.

Why This Integration

Last9 already supports LLM observability through OpenTelemetry-you can send traces from any LLM implementation today. But if you're using TrueFoundry AI Gateway for features like unified access control, intelligent routing, and cost management across 250+ model providers, this integration makes that data immediately available in Last9 with zero additional instrumentation.

The value: TrueFoundry's detailed LLM telemetry (token usage, costs, model versions, routing decisions) appears in Last9 alongside your existing infrastructure data. When debugging, you see the full picture-application traces, database queries, and LLM calls-in one view.

What TrueFoundry Does

TrueFoundry AI Gateway sits between your applications and LLM providers (OpenAI, Anthropic, Google, etc.). Instead of each service calling model APIs directly, they route through the Gateway.

This gives you:

Unified access control and rate limiting across all model providers
Cost tracking and quota management
Intelligent routing and load balancing
Centralized observability for all LLM traffic

The Gateway emits OpenTelemetry traces for every LLM request, capturing latencies, token usage, costs, and errors.

What Last9 Brings

Last9 is built to handle high-cardinality observability data efficiently. Every LLM request generates telemetry tagged with user IDs, tenant IDs, model versions, prompt versions, regions-easily 8+ dimensions per trace.

Most observability systems either slow down with this cardinality or charge per unique time series. Last9 is designed for it. You can filter by tenant_id=customer-x AND model=gpt-4 AND region=us-east without query performance degrading or costs spiking.

More importantly: Last9 is where your non-LLM telemetry already lives. Logs, metrics, and traces from your entire stack are in one place. Adding LLM traces means you can correlate LLM behavior with the rest of your infrastructure during incidents.

What the Integration Enables

Once configured, every LLM request through TrueFoundry automatically appears in Last9 as a trace. You see:

Gateway operations (routing, authentication, rate limiting)
Model provider calls (which API was hit, response time, status)
Token metrics (input/output tokens per request)
Cost data (calculated per-request costs)

These traces show up in your service map as tfy-llm-gateway. When debugging, you can:

Filter traces by service and see LLM dependencies:
service.name=checkout shows your checkout service traces, including child spans for LLM calls made via the Gateway.

Correlate LLM errors with infrastructure issues:
Is the checkout failure from a database timeout or the LLM recommendation engine? The trace shows both.

Track LLM costs by tenant:
Filter by tenant_id to see per-customer LLM usage and costs, correlated with their overall API traffic.

Compare latency across models:
Group traces by model_version to see if switching from GPT-4 to Claude 3.5 affected performance.

Example: Unified Debugging

Your P95 latency SLO breaches. You open Last9, filter by your service, and see a heatmap showing the spike started at 14:23 UTC.

You click into a slow trace. The flame graph shows most time is in a child span: tfy-llm-gateway. You drill into that span and see it's calling model=gpt-4 for a specific tenant_id.

You filter all traces by that tenant + model combination and discover they're sending 10x longer prompts than usual, hitting rate limits.

Total time: ~2 minutes. No switching tools, no correlating timestamps across systems.

High-Cardinality Data by Default

LLM telemetry is inherently high-cardinality. You're tracking:

User/tenant IDs
Model providers and versions
Prompt template versions
Routes and endpoints
Regions
Token counts and costs

That's easily 100,000+ unique tag combinations per day at moderate scale. Last9 handles this without performance or cost penalties. You can query by any combination of these dimensions and get results in <1 second.

Getting Started

If you're already running TrueFoundry AI Gateway and Last9, the integration takes less than 10 minutes to configure. It uses standard OpenTelemetry OTLP over HTTP-no code changes to your applications.

View the integration guide for setup instructions.

New to Last9? Sign up to see how unified telemetry simplifies observability across your entire stack.