Last9 Last9

Mar 23rd, ‘23 / 5 min read

Understanding “Cricket Scale”

How does a DevOps/Site Reliability Engineer plan for "Cricket scale"? How do you warm systems' about to witness 30+ million concurrent users?

Understanding “Cricket Scale”

One of the most fascinating stories of ‘infrastructure’ engineering is from India. It's not as globally recognized, or spoken about in the same breath as some of the more popular events. I call this the “Cricket Scale”.

Select few companies have witnessed this unprecedented scale, and far fewer understand the many technical challenges of orchestrating such large, high-stakes sporting events. In most of my conversations with peers (outside of Asia), they're stunned by this engineering extravaganza. And there are plenty of untold stories on this front.

What in the world is “Cricket Scale”? 🏏

Few events in the world attract a global audience. Football takes the cake — nearly half the world watched the 2022 World Cup. 113 million people watched the Superbowl.

And then, there’s cricket.

~450 million people watched the last big cricketing event; the Indian Premier League. 30 million concurrent viewers on one app though. That’s 30 million people watching a sporting event together on one application. Concurrency is the real killer here. One has to orchestrate your infra to manage a scale that shoots up and alternatively scales down dramatically.

Then there are these fascinating engineering edge cases one must check out. One that I personally like. 👇

Fascinating engineering edge cases

What does all this mean? 💥

For a Site Reliability Engineer (SRE), this is a BIG deal. The team has to warm its internal infrastructure to manage data at such a scale. A lot can go wrong, and much thinking goes into orchestrating an event at that scale.

Disney Hotstar on Last9
Disney Hotstar on Last9

How crazy do things get? 🚦

At its bare minimum, it’s 600+ million metrics a minute at 200 ms of ingestion. And this is only half the story. War rooms get chaotic with multiple dashboards, static alerting rules go out of the window, and read workloads are even more staggering than writes.

There are few playbooks on how to serve that scale. Not only do engineering teams have to create a playbook, but they also have to stay vigilant on service degradations and customer patterns.

Our time series data warehouse, Levitate does this at an incredible scale.

Introducing Levitate: ‘uplifting’ your metrics woes because self-management sucks like gravity | Last9
Managing your own time series database is painful. We’ve moved from servers to services, and yet, monitoring metrics data is primitive. Our managed time series database powers mission-critical workloads for monitoring, at a fraction of the cost.

Customer patterns? 🤔

Everyone has their favorites. When stars such as Indian cricketer Virat Kohli takes the batting crease, traffic spikes. An (SRE) has to provision more servers, understand traffic, and ‘observe’ service degradations.

Engineers need a precise vocabulary for a system to tell them something is not right, and where things are not right. Because systems are complex, understanding what is happening is a complicated problem to solve.

Dukaan on Last9
Dukaan on Last9

How do SREs manage Cricket Scale?

The ability to map out your entire infrastructure is critical. After all, you can’t measure what you can’t see. Here’s a simple checklist of where to start. We’ve simplified this to absolute basic:

  • Step 1: Get your instrumentation right. You want to declare your entities that need to be ‘Observed’. Only the immediate critical ones. Measure what matters.
  • Step 2: Map out CDN Configurations, 3rd party tools powering your platform.
  • Step 3: Identify key critical Infrastructure metrics that need immediate monitoring.
  • Step 4: Understand Latencies and availability baselines to write SLOs that matter.
  • Step 5: Create actionable alerting protocols on SLO degradations. This means understanding the, “If This Then That” for outage restorations.

These are the absolute basics that need to come together days before the match starts, so one can Load test and prepare for the Cricket scale. I’ve simplified this for folks who routinely want to understand what it takes to manage this scale. There’s A LOT more that goes behind the scenes.

Want to know how we build this step-by-step? Chat with us? 👇

Schedule Time with Us | Last9
Talk to us on how Levitate can help you manage your monitoring infrastructure and reduce TCO by up-to 50%

What makes Last9 special compared to others?

Last9 has a time series data warehouse that helps you store, manage, and efficiently query data. We call it Levitate.

Levitate has powerful features to help you save costs, manage scale and rein in cardinality.

Levitate tiers data into different categories, so querying is fast and doesn’t crash the system. We have Policies & Governance to structure and trim your data. Levitate’s alerting tools give you a simple vocabulary to understand critical infrastructure. And… There’s a lot more.

Check out our Whitepaper to understand Levitate

Levitate can crunch 5 trillion data samples across 30 days at a max 100ms latency. During the game days, it becomes the single most crucial and trusted pane to drive business goals.

For example, you want to change the payment provider during the live match because one has degraded and is failing customers. Being aware of these is the first part of the puzzle, then solving for these unpredictable outcomes makes Levitate a trustable tool to drive business goals.

Levitate is your war-time buddy during large-ticket events such as these.

It's difficult to grok such large amounts of data and understand when and where things can go wrong. Here’s an anecdotal example of how we're able to do this at scale — Shannon Limits and engineering reliability


Want to know more about Last9 and our products? Check out last9.io; we're building Reliability tools to make running systems at scale, fun, and embarrassingly easy. 🟢

Contents


Newsletter

Stay updated on the latest from Last9.

Authors
Aniket Rao

Aniket Rao

http://1x.engineer @last9io 💻 Programmer | 🌌 Astrophile | 🎮 FIFA Player |🏌️‍♂️Amateur Golfer

X