USENIX SREcon is a conference series organized by the USENIX Association, primarily focused on Site Reliability Engineering and related fields in large-scale, high-reliability, high-availability systems. It’s a gathering point for professionals working in SRE, systems engineering, software engineering, and infrastructure operations. SREcon brings together experts and practitioners to share knowledge, best practices, and insights into maintaining the reliability and performance of complex systems.
Many talks cover case studies from major tech companies, this real-world insight is valuable for professionals facing similar issues. The organizers prioritize diversity and inclusion, offering scholarships, mentoring, and programming aimed at underrepresented groups in tech to encourage broad participation.
SREcon serves as a critical venue for networking, professional development, and sharing innovative solutions.

My Takeaways
Companies aim to leverage mature tools and extract actionable insights from them to maximize value and reduce costs, particularly within observability and incident response. This is a feedback loop that enables increased quality and velocity.
Data governance is a complex topic, whether you’re working with S3, a data warehouse, a data store, a database, big data or something else. In all cases, load balancing, horizontal scaling and distributed consensus are crucial to success.
Development portals are becoming essential as the field evolves rapidly. Commercial solutions are racing to keep up with Backstage’s leadership; however, they are still not sufficiently extensible or customizable to meet all needs.
eBPF has reached maturity, and its application is expanding into use cases I would not have anticipated.
Some engineers seem to have an unusual fascination with Slack, which I don’t share due to its poor UX.

My Event
Dude, You Forgot the Feedback: How Your Open Loop Control Planes Are Causing OutagesLaura de Vesine - DatadogYou Depend on Time, This Is How It Works and You Won’t Believe ItPhilip Rowlands - Jane StreetWorkshop: Loadshedding and Isolation Using Envoy ProxyLaura Nolan & Niall Murphy - StanzaAchieving Excellence: SLO Thresholds That Transform Service QualityThiara Ortiz - NetflixSelective Reliability Engineering: There Is No Single Source of TruthElise Burke - DatadogWhy You’re (Probably) Doing Service Catalogs WrongLisa Karlin Curtis - incident.ioExploring the Unintended Consequences of Automation in SoftwareCourtney Nash - The VOIDRock around the Clock (Synchronization): Improve Performance with High Precision Time!Lerna Ekmekcioglu - Clockwork SystemsMnemonic Rules for Eponymous Laws or: There’s a Law for That!Peter Burkholder - U.S. GovernmentLessons from Unix HistoryDiomidis Spinellis - AUEB & TU DelftTreat Your Code as a Crime SceneAdam Tornhill - CodeSceneFrom PIDs to Pods: The Life Cycle of an eBPF-Autoinstrumented ApplicationMarc Tudurí - Grafana LabsScheduling at Scale: eBPF Schedulers with Sched_extDaniel Hodges - MetaNoisy Neighbors, through NetworkingRené Treffer and Ben Kochie - RedditTaming Noisy Benchmark Results Using Change Point DetectionMatt Fleming - CloudflareHow a Single API Endpoint Saved Us 3000 CPULasse Hels - MaerskManaging the Risk of Software Supply Chain AttacksMark Hahn - QualysSynthetic Monitoring and E2E Testing: 2 Sides of the Same CoinCarly Richmond - ElasticRe-Building Envoy in RustDawid Nowak - Huawei Ireland Research LabWhat About the Engineer's MTTR?Ian Duffy - CloudsmithHow to SRE Anything to Work Smarter and Live BetterJennifer Petoff - GoogleHow SRE Can Help With Cost & EfficiencyJohn Looney - Crusoe EnergySRE for LLMs: What We Learned While LaunchingJohn Lunney - GoogleBreaking Out of Our Hybrid Cloud Datastore EOL ChainsKonstantinos Fardelas - Skroutz SAThe Voyager Spacecraft—These Are the Only Engineers on Earth Who Want To Maximize LatencyRobert Barron - IBMRollout Monitoring at Scale: Reflections on Adopting Canarying in GCERoberto Frenna - Google9 SLIs; OH MY!Sal Furino - Bloomberg CREOpening the Box: Diagnosing Operating-System Task-Scheduler Behavior on Highly Multicore MachinesJulia Lawall - Inria-ParisGranular CPU Capacity Management at Scale with eBPFGeorge Brighton and Cameron Howes - Goldman SachsRiot Games: Evolution of Observability at the Gaming Company- Erick Moreira and Kirill Mikhailov - Riot GamesA Powerful Logs Management Solution We All Have and Use but We Underestimate: systemd-journalCosta Tsaousis - NetdataBlast Radius Reduction for Large-Scale Distributed SystemsLinhua Tang - Huawei Ireland Research CentreGet Your Non-SREs Oncall Ready!JC van Winkel and Brad Lipinski - GoogleTransforming Production ReadinessPanagiotis Moustafellos - ElasticEnergy Consumption of DatacentersThomas FrickeAre We Really Engineers?Hillel Wayne
