Here you’ll find a collection of video presentations I’ve given at various conferences and meetups.
August 21, 2025
YouTube
Online
This video is the uncut developer keynote from Google Cloud Next 2024.
September 28, 2024
SRE NEXT 2024
Infrastructure breaks, but systems can persist! In this talk, Steve presents concepts like spanning failure domains and using generic mitigations through Platform Engineering, as well as introducing a lab environment where teams can experiment with these capabilities directly.
August 22, 2024
This talk provides a comprehensive introduction to Infrastructure as Code (IaC), covering fundamental concepts, best practices, and practical approaches to getting started with managing infrastructure programmatically.
June 22, 2024
SREday San Francisco 2024
San Francisco, California, United States
This video discusses achieving DORA (DevOps Research and Assessment) outcomes by building reliability.
May 15, 2024
GDG Cloud Southlake
In this presentation for GDG Cloud Southlake, Steve McGhee explores how organizations can move beyond the 'myth' of 100% uptime to a more controlled, engineering-based approach to availability. Key themes include Reliability as a Choice, Risk and Probability, Control Over Maximization, and Practical Strategies like Canary releases and symptom-based alerting.
May 2, 2024
In this presentation, Steve McGhee and Ameer Abbas introduce a comprehensive model for designing and operating reliable cloud services, exploring various system archetypes and their implications for reliability.
November 3, 2023
This video is a panel discussion about Modern Monitoring and Observability, featuring Jason Hand of Datadog, Ernest Mueller of Accenture, Steve McGhee of Google, and Peco Karayanev of PagerDuty. Hosted by PagerDuty DevOps Advocate Mandi Walls.
October 26, 2023
This video discusses achieving DORA (DevOps Research and Assessment) outcomes through the application of Reliability Engineering principles within your platform.
This presentation covers best practices and strategies for implementing safe deployment patterns in serverless environments using Cloud Run, ensuring reliability and minimizing risk during deployments.
SLOconf 2023
Steve from Google and Sal from Nobl9 present two independently developed methods for teaching SRE topics to customers, which we have discovered are actually quite similar. Steve presents the 'reliability map' and Sal shows Nobl9's SLODLC.
October 5, 2023
Steve McGhee and Ameer Abbas explore the intersection of reliability engineering with Kubernetes and Istio, demonstrating how these technologies can be leveraged to build and maintain reliable applications at scale.
August 15, 2023
This talk provides a comprehensive introduction to Service Level Indicators (SLIs) and Service Level Objectives (SLOs), drawing from Steve McGhee's experience implementing these practices at Google.
April 10, 2023
In this segment with Rootly, Steve McGhee shares insights about his role in reliability engineering and discusses the importance of human factors in building and maintaining reliable systems.
January 30, 2023
DevOpsDays Boston 2022
Boston, MA
This presentation provides a detailed roadmap for implementing Site Reliability Engineering (SRE) practices in enterprise environments, covering strategies, challenges, and best practices for successful adoption.
November 16, 2022
Is It Observable
This talk delves into the world of non-standard Service Level Objectives (SLOs), moving beyond traditional availability metrics to explore more nuanced ways of measuring and ensuring service reliability.
In this lightning talk, Steve McGhee covers the essential concepts of observability and the importance of open standards in building observable systems.
October 1, 2022
SREcon22 EMEA
Unknown
Steve McGhee and James Brookbank present on the 'Enterprise Roadmap for SRE' O'Reilly report, addressing the challenges enterprises face in adopting Site Reliability Engineering (SRE) and strategies to overcome them.
January 29, 2022
This video explores a cloud-native future, discussing how to determine the reliability of engineering patterns and identify which parts of your system require upgrades.
July 1, 2020
This talk explores how to effectively apply Site Reliability Engineering (SRE) principles to platform engineering. We discuss practical approaches to building reliable platforms and how SRE methodologies can enhance platform operations and development.
October 26, 2012
This video explains how with Anthos and GitLab you can build a multi-team platform for streamlining CI/CD, policy management and developer onboarding across your organization and teams.