Video Streaming 300+ engineers APAC Amazon Web Services

Reliable Observability for 50+ million concurrent live-streaming viewers

Download PDF

Key Points

Single source

for all metric workloads

50%

reduced TCO

Zero Toil

with better performance

Talk to an expert

Talk to us to understand how Last9 can reduce costs, and manage cardinality.

Book a demo

Last9 works with some of the world's largest streaming companies. One of our customers shows movies, TV shows, and large-scale sporting events for its millions of subscribers. At Cricket Scale hundreds, not dozens, of microservices harmoniously stitch together compelling user experiences that keep viewers glued to their devices.

No dropped frames, never miss a moment

A cricket match can have over 50 million concurrent viewers. Games last about 3-4 hours, and systems are warmed up hours in advance in anticipation of the sudden surge in traffic. Significant ephemeral resources come online to last the game's duration and be torn down soon after. Hundreds of engineers work on backend services and a robust infrastructure to enable the live-streaming of such high-ticketing events.

When something goes wrong, the war room team needs to immediately isolate the issue. The next step is to draft a Root Cause Analysis (RCA) and route it to the appropriate team for further investigation. Every additional second that is taken to diagnose a problem adversely impacts advertising revenue. Above all, it causes massive viewer dissatisfaction, given the criticality of missing a major sporting memory. Teams often find themselves investigating leading indicators of failure. These problems manifest on social media platforms, and before they spread, need to be triaged and fixed.

Growing Pains

Scale

Scaling in-house metrics with business growth

Uptime

Maintaining uptime and query guarantees

Toil

Managing a TSDB instead of focusing on Product

Standardization

Standard Telemetry across teams

The Multifaceted Infrastructure Platform

A diverse and complex infrastructure platform powers our customer's scale. Hundreds of microservices and a variety of data stores handle persistent data storage; some are fully managed by Cloud Hyperscalers, and the in-house team manages others. The scale-up necessitated by such games results in many ephemeral resources coming online. Consistent and uniform Observability across these disparate sources is incredibly challenging; to observe their health all of these sources continuously emit metrics and in plenty. Over time, the team's engineers noticed a sprawl of metrics and monitoring techniques, making it hard to standardize the telemetry and monitoring.

The team was using an in-house setup based on VictoriaMetrics (a popular open-source time series database) and InfluxDB for metrics management. For visualizing data and managing alerts, Grafana was used.

Growing scale and concurrent access woes

Thousands of dashboards were created by multiple teams presenting unique challenges. Grafana dashboards and alerts were concurrently accessing the same underlying metrics storage. These underlying databases could not keep up with massive ingestion and simultaneous queries. Inevitably, the storage would ultimately go down leaving the teams oblivious to the health of their infrastructure. Instead of focusing on features and innovating on the product, the engineering team spent countless hours keeping the Observability platform up.

To reliably support the team's incredible infrastructure growth, they needed a next-generation Observability platform. Given their unique challenges and incredible scale, the team needed a product that could withstand cricket scale , sustain uptime, be globally available, and not explode costs.

The Last9 Advantage

Open Standards

Zero integration efforts

Superfast Ingestion

50% reduction in write latency

Data Tiering

Solves concurrent access woes and powers long term retention

Last9's is a globally available time series & events warehouse designed for scale, high cardinality and long term retention.

Open Standards

Last9 ingests data from multiple open standards, such as Prometheus exposition, OpenTelemetry Metrics, OpenMetrics, and InfluxDB. This ensured no migration effort was needed at our customer's scale of hundreds of micro-services. Hundreds of engineers were onboarded to existing and new workloads on Last9 within weeks, given interoperability and ease of integration. Since Last9 is fully compatible with Open standards on the output layer, the team could keep using their existing dashboards and alerting workflows.

Within a month, Last9 was the source of truth for all metrics workloads across our customers' teams.

SLA Guarantees

Last9 is a managed service with Service Level Agreement (SLA) guarantees and clawbacks for both Read and Write workloads. This eliminated the toil and upkeep to manage and scale our customer's in-house metrics setup.

99.99% Write

99.95% Read

Long Term Retention

With the previous in-house metrics setup, teams could not retain data beyond a month for critical analysis. Imagine having billions of data points of consumer behavior, but being unable to use them for growing business needs.

Last9's automatic data tiering and retention policies paved the way for long-term time series storage. This helped the team with capacity planning and business insights year after year. By default, the latest data is available in all tiers, but their retention policies vary.

3 hours Blaze tier

6 months Hot tier

1 year Cold tier

Last9's Data Tiering capability is also used on the query layer, creating policies for accessing the Blaze tier only for alerting. The other tiers can then be used for deeper exploration and analysis. This resolved the concurrent access issue they faced with the in-house metric setup.

Observability is a foundational building block and can unlock much goodness — however, it's deviously complex to get right. The founders at Last9, aptly named, have been amazing partners in trying to make inroads on what a solid observability platform should be and hit most, if not all, of the building blocks. Read More ↗

Akash Saxena

Key results

Single Source of Truth

Single data source for all metrics workloads

Zero Toil, Better Performance

No toil of managing an in-house TSDB

Reduced TCO

Total Cost of Ownership reduced by 50%

Last9 has improved query speeds, dramatically reduced the Total Cost of Ownership (TCO) by 50%, and is currently the bedrock for the customers' entire infrastructure.

Bring Your Own Cloud Model

Last9 comes with a Bring Your Own Cloud (BYOC) model - we can deploy in our customers' cloud directly offering all the features Last9 comes with.

Last9 is currently the bedrock for the customers' entire infrastructure.

With optimized auto-tiered storage, warehousing control levers, and availability guarantees, we've reduced the toil of managing a time series database and the engineering overheads that come with it — something seldom factored in while calculating the cost of running your own Observability team.

Understand how engineering teams at Quickwork, Clevertap, Replit, and more are using Last9 to enable SaaS monitoring.

What does "Cricket scale" mean for a Site Reliability Engineer?

Understanding “Cricket Scale”

How does a DevOps/Site Reliability Engineer plan for "Cricket scale"? How do you warm systems' about to witness 30+ million concurrent users?

Read

Aniket Rao

OSS vs Paid vs Managed OSS — Picking what works for your Observability journey

Observability—OSS vs Paid vs Managed OSS

The Reliability industry needs a managed, non-vendor lock-in answer to spiraling costs, high cardinality and the toil of managing a tsdb

Read

Satyajeet Jadhav

Take back control of your Monitoring

Take back control of your Monitoring with Last9 - a managed time series data warehouse

Read

Nishant Modak

Start observing for free. No lock-in.

Book demo

OPENTELEMETRY • PROMETHEUS

Just update your config. Start seeing data on Last9 in seconds.

DATADOG • NEW RELIC • OTHERS

We've got you covered. Bring over your dashboards & alerts in one click.

BUILT ON OPEN STANDARDS

100+ integrations. OTel native, works with your existing stack.

4.8/5

G2 Reviews

Reliable Observability for 50+ million concurrent live-streaming viewers

Key Points

Single source

50%

Zero Toil

Talk to an expert

No dropped frames, never miss a moment

Growing Pains

Scale

Uptime

Toil

Standardization

The Multifaceted Infrastructure Platform

Growing scale and concurrent access woes

The Last9 Advantage

Open Standards

Superfast Ingestion

Data Tiering

Open Standards

SLA Guarantees

Long Term Retention

Key results

Single Source of Truth

Zero Toil, Better Performance

Reduced TCO

Bring Your Own Cloud Model

Related posts

Start observing for free. No lock-in.