If you’re an SRE—or know and love one—then you probably already know SRECon is the annual meetup for site reliability engineers.
So, What’s SRECon All About?
Hosted by USENIX, SRECon brings together everyone from newbies to industry legends, all eager to talk about what works, what fails spectacularly, and how we can keep pushing for more reliable, scalable tech. It’s a community-driven, solutions-oriented conference for anyone looking to up their reliability game.
New for 2024: The Discussion Track
This year, SRECon introduced a fresh concept: the Discussion Track.
It’s a space where attendees can go beyond presentations and have interactive discussions, led by experienced hosts who shape each session into whatever the group needs: an AMA, casual brainstorming, or an unconference vibe.
Highlights from Day 1
Here are some of the talks I enjoyed at SRECon Dublin 2024:
The title alone brought people in. This session highlighted the risks of "fire and forget" control planes that lack real-time feedback, which can lead to outages. Laura walked through ways to design control planes that actively report on actions and their impacts, making systems more reliable and reducing operational errors.
This talk shared some great practical examples to help SRE teams build resilience and work better together when facing challenges. It also explored fun ways to tap into that "superhero" energy within the team, encouraging talent development while keeping everyone on the same page and accountable.
The discussion focused on three key frontiers they actively invested in: Data Operations and Monitoring Event-Based Systems, Mobile Observability, and Effective Management Practices for Reliability.
Heinrich broke down how hitting top reliability means having lots of active feedback loops, and he even shared a handy diagram to show how it’s done.
The team addressed the challenging issue of managing secrets in an open-source CI/CD pipeline by transitioning from static secrets to OIDC-based access, enhancing security and engineer empowerment.
Lerna Ekmekcioglu from Clockwork Systems discussed the crucial role of clock synchronization in addressing latency issues in distributed systems.
She explained how it can be tough to pinpoint slowdowns, especially in complex environments like on-premises and cloud setups. The talk demonstrated how network contention impacts tail latencies and shared insights on various clock synchronization protocols, their pros and cons, and best practices for managing clock discipline. It was definitely one of the most interesting talks of the day!
This session allowed everyone to come together and discuss cost management, facilitated by knowledgeable guides. It was an informal gathering rather than a prepared talk for questions and conversations among everyone interested in managing costs.
The speaker shared valuable insights and real-world examples on topics like Monitoring Distributed Systems, Eliminating Toil, and Postmortem Culture. We walked away with practical ideas and guidelines to help us better understand and operate our database systems, including tips on selecting the right SLIs and SLOs.
The speakers took a look at some common confusion in system design and data modeling, while also thinking about bigger questions related to truth, the sources they trusted, and why those uncertainties really mattered.
One of the highlights of the event was the panel discussion titled "Is Reliability a Luxury Good?" featuring insights from industry experts Andrew Ellam, Niall Murphy from Stanza, Joan O'Callaghan from Udemy, and Avleen Vig.
Their diverse perspectives sparked thought-provoking discussions on the importance of building reliable systems and the trade-offs companies must consider when investing in reliability.
This session offered an open space for attendees to discuss SLOs with a few experts. It wasn’t a structured talk or workshop but a relaxed, interactive discussion where people could ask questions and connect with others interested in SLOs.
Our team’s got some awesome merch with them, so don’t miss out—track us down and grab yours! 😎
I am already looking forward to Day 2 of SRECon Dublin 2024.
Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.