Customer Stories

Reliable Observability for 25+ million concurrent live-streaming viewers

Download PDF
  • Video Streaming
  • 300+ engineers
  • APAC
  • Amazon Web Services

Last9 works with some of the world’s largest streaming companies. One of our customers shows movies, TV shows, and large-scale sporting events for its millions of subscribers. At Cricket Scale hundreds, not dozens, of microservices harmoniously stitch together compelling user experiences that keep viewers glued to their devices.

No dropped frames, never miss a moment

A cricket match can have over 25 million concurrent viewers. Games last about 3-4 hours, and systems are warmed up hours in advance in anticipation of the sudden surge in traffic. Significant ephemeral resources come online to last the game’s duration and be torn down soon after. Hundreds of engineers work on backend services and a robust infrastructure to enable the live-streaming of such high-ticketing events.

When something goes wrong, the war room team needs to immediately isolate the issue. The next step is to draft a Root Cause Analysis (RCA) and route it to the appropriate team for further investigation. Every additional second that is taken to diagnose a problem adversely impacts advertising revenue. Above all, it causes massive viewer dissatisfaction, given the criticality of missing a major sporting memory. Teams often find themselves investigating leading indicators of failure. These problems manifest on social media platforms, and before they spread, need to be triaged and fixed.

Growing Pains

Scale

Scaling in-house metrics with business growth

Uptime

Maintaining uptime and query guarantees

Toil

Managing a TSDB instead of focusing on Product

Standardization

Standard Telemetry across teams

The Multifaceted Infrastructure Platform

A diverse and complex infrastructure platform powers our customer’s scale. Hundreds of microservices and a variety of data stores handle persistent data storage; some are fully managed by Cloud Hyperscalers, and the in-house team manages others. The scale-up necessitated by such games results in many ephemeral resources coming online. Consistent and uniform Observability across these disparate sources is incredibly challenging; to observe their health all of these sources continuously emit metrics and in plenty. Over time, the team’s engineers noticed a sprawl of metrics and monitoring techniques, making it hard to standardize the telemetry and monitoring.

The team was using an in-house setup based on VictoriaMetrics (a popular open-source time series database) and InfluxDB for metrics management. For visualizing data and managing alerts, Grafana was used.

Growing scale and concurrent access woes

Thousands of dashboards were created by multiple teams presenting unique challenges. Grafana dashboards and alerts were concurrently accessing the same underlying metrics storage. These underlying databases could not keep up with massive ingestion and simultaneous queries. Inevitably, the storage would ultimately go down leaving the teams oblivious to the health of their infrastructure. Instead of focusing on features and innovating on the product, the engineering team spent countless hours keeping the Observability platform up.

To reliably support the team’s incredible infrastructure growth, they needed a next-generation Observability platform. Given their unique challenges and incredible scale, the team needed a product that could withstand cricket scale, sustain uptime, be globally available, and not explode costs.

The Last9 Advantage

Open Standards

Zero integration efforts

Superfast Ingestion

50% reduction in write latency

Data Tiering

Solves concurrent access woes and powers long term retention

Last9’s Levitate is a globally available time series & events warehouse designed for scale, high cardinality and long term retention.

Open Standards

Levitate ingests data from multiple open standards, such as Prometheus exposition, OpenTelemetry Metrics, OpenMetrics, and InfluxDB. This ensured no migration effort was needed at our customer’s scale of hundreds of micro-services. Hundreds of engineers were onboarded to existing and new workloads on Levitate within weeks, given interoperability and ease of integration. Since Levitate is fully compatible with Open standards on the output layer, the team could keep using their existing dashboards and alerting workflows.

Within a month, Levitate was the source of truth for all metrics workloads across our customers’ teams.

SLA Guarantees

Levitate is a managed service with Service Level Agreement (SLA) guarantees and clawbacks for both Read and Write workloads. This eliminated the toil and upkeep to manage and scale our customer’s in-house metrics setup.

Levitate’s Availability SLA Guarantees
99.99%write
99.95%read

Long Term Retention

With the previous in-house metrics setup, teams could not retain data beyond a month for critical analysis. Imagine having billions of data points of consumer behavior, but being unable to use them for growing business needs.

Levitate’s automatic data tiering and retention policies paved the way for long-term time series storage. This helped the team with capacity planning and business insights year after year. By default, the latest data is available in all tiers, but their retention policies vary.

Levitate’s Default Tier Retention
3 hoursBlaze tier
6 monthshot tier
1 yearcold tier

Levitate’s Data Tiering capability is also used on the query layer, creating policies for accessing the Blaze tier only for alerting. The other tiers can then be used for deeper exploration and analysis. This resolved the concurrent access issue they faced with the in-house metric setup.

Observability is a foundational building block and can unlock much goodness — however, it’s deviously complex to get right. The founders at Last9, aptly named, have been amazing partners in trying to make inroads on what a solid observability platform should be and hit most, if not all, of the building blocks. Read More

Key Results

Single Source of Truth

Single data source for all metrics workloads

Zero Toil, Better Performance

No toil of managing an in-house TSDB

Reduced TCO

Total Cost of Ownership reduced by 50%

Levitate has improved query speeds, dramatically reduced the Total Cost of Ownership (TCO) by 50%, and is currently the bedrock for the customers’ entire infrastructure.

Bring Your Own Cloud Model

Levitate comes with a Bring Your Own Cloud (BYOC) model - we can deploy in our customers’ cloud directly offering all the features Levitate comes with.

Levitate is currently the bedrock for the customers’ entire infrastructure.

With optimized auto-tiered storage, warehousing control levers, and availability guarantees, we’ve reduced the toil of managing a time series database and the engineering overheads that come with it — something seldom factored in while calculating the cost of running your own Observability team.

Talk to the Last9 team to understand how Levitate can unlock value for you as well. Get a demo or get started today.

Handcrafted Related Posts

Do away with the toil of managing your own Prometheus

Start your monitoring journey today with Levitate. A Managed Time Series Data Warehouse that SREs trust.