Talks and Videos from Steve

Here you’ll find a collection of video presentations I’ve given at various conferences and meetups.

▶

Google Cloud Next 2024 = Developer Keynote (UNCUT)

August 21, 2025

YouTube

Online

This video is the uncut developer keynote from Google Cloud Next 2024.

Watch Video

▶

Reliable Systems through Platform Engineering

September 28, 2024

SRE NEXT 2024

Online

Infrastructure breaks, but systems can persist! In this talk, Steve presents concepts like spanning failure domains and using generic mitigations through Platform Engineering, as well as introducing a lab environment where teams can experiment with these capabilities directly.

Watch Video

▶

Infrastructure as code

August 22, 2024

YouTube

Online

This talk provides a comprehensive introduction to Infrastructure as Code (IaC), covering fundamental concepts, best practices, and practical approaches to getting started with managing infrastructure programmatically.

Watch Video

▶

Achieving DORA outcomes by building reliability | Steve McGhee | SREday San Francisco 2024

June 22, 2024

SREday San Francisco 2024

San Francisco, California, United States

This video discusses achieving DORA (DevOps Research and Assessment) outcomes by building reliability.

Watch Video

▶

Reliability Theory and Practice

May 15, 2024

GDG Cloud Southlake

Online

In this presentation for GDG Cloud Southlake, Steve McGhee explores how organizations can move beyond the 'myth' of 100% uptime to a more controlled, engineering-based approach to availability. Key themes include Reliability as a Choice, Risk and Probability, Control Over Maximization, and Practical Strategies like Canary releases and symptom-based alerting.

Watch Video

▶

Archetypes for Reliable Systems

May 2, 2024

YouTube

Online

In this presentation, Steve McGhee and Ameer Abbas introduce a comprehensive model for designing and operating reliable cloud services, exploring various system archetypes and their implications for reliability.

Watch Video

▶

Panel Discussion: Modern Monitoring and Observability

November 3, 2023

YouTube

Online

This video is a panel discussion about Modern Monitoring and Observability, featuring Jason Hand of Datadog, Ernest Mueller of Accenture, Steve McGhee of Google, and Peco Karayanev of PagerDuty. Hosted by PagerDuty DevOps Advocate Mandi Walls.

Watch Video

▶

Achieving DORA outcomes via Reliability Engineering in your Platform

October 26, 2023

YouTube

Online

This video discusses achieving DORA (DevOps Research and Assessment) outcomes through the application of Reliability Engineering principles within your platform.

Watch Video

▶

Safe serverless deployments with Cloud Run

October 26, 2023

YouTube

Online

This presentation covers best practices and strategies for implementing safe deployment patterns in serverless environments using Cloud Run, ensuring reliability and minimizing risk during deployments.

Watch Video

▶

Two Paths in the Woods

October 26, 2023

SLOconf 2023

Online

Steve from Google and Sal from Nobl9 present two independently developed methods for teaching SRE topics to customers, which we have discovered are actually quite similar. Steve presents the 'reliability map' and Sal shows Nobl9's SLODLC.

Watch Video

▶

Build Reliable Applications with Kubernetes and Istio

October 5, 2023

YouTube

Online

Steve McGhee and Ameer Abbas explore the intersection of reliability engineering with Kubernetes and Istio, demonstrating how these technologies can be leveraged to build and maintain reliable applications at scale.

Watch Video

▶

How to get started with SLI/SLO

August 15, 2023

YouTube

Online

This talk provides a comprehensive introduction to Service Level Indicators (SLIs) and Service Level Objectives (SLOs), drawing from Steve McGhee's experience implementing these practices at Google.

Watch Video

▶

Rootly Humans of Reliability

April 10, 2023

YouTube

Online

In this segment with Rootly, Steve McGhee shares insights about his role in reliability engineering and discusses the importance of human factors in building and maintaining reliable systems.

Watch Video

▶

Enterprise Roadmap for SRE

January 30, 2023

DevOpsDays Boston 2022

Boston, MA

This presentation provides a detailed roadmap for implementing Site Reliability Engineering (SRE) practices in enterprise environments, covering strategies, challenges, and best practices for successful adoption.

Watch Video

Non-standard SLOs: beyond availability

November 16, 2022

Is It Observable

Online

This talk delves into the world of non-standard Service Level Objectives (SLOs), moving beyond traditional availability metrics to explore more nuanced ways of measuring and ensuring service reliability.

Watch Video

Observability Fundamentals: Open Standards

November 16, 2022

Is It Observable

Online

In this lightning talk, Steve McGhee covers the essential concepts of observability and the importance of open standards in building observable systems.

Watch Video

SRE in Enterprise

October 1, 2022

SREcon22 EMEA

Unknown

Steve McGhee and James Brookbank present on the 'Enterprise Roadmap for SRE' O'Reilly report, addressing the challenges enterprises face in adopting Site Reliability Engineering (SRE) and strategies to overcome them.

Watch Video

▶

How VMs are the Matryoshka doll of compute

January 29, 2022

Unknown

Online

This video explores a cloud-native future, discussing how to determine the reliability of engineering patterns and identify which parts of your system require upgrades.

Watch Video

▶

Putting SRE Principles in Platform Engineering

July 1, 2020

YouTube

Online

This talk explores how to effectively apply Site Reliability Engineering (SRE) principles to platform engineering. We discuss practical approaches to building reliable platforms and how SRE methodologies can enhance platform operations and development.

Watch Video

▶

Build a Software Delivery Platform with Anthos

October 26, 2012

YouTube

Online

This video explains how with Anthos and GitLab you can build a multi-team platform for streamlining CI/CD, policy management and developer onboarding across your organization and teams.

Watch Video

Talks and Videos from Steve#

Google Cloud Next 2024 = Developer Keynote (UNCUT)

Reliable Systems through Platform Engineering

Infrastructure as code

Achieving DORA outcomes by building reliability | Steve McGhee | SREday San Francisco 2024

Reliability Theory and Practice

Archetypes for Reliable Systems

Panel Discussion: Modern Monitoring and Observability

Achieving DORA outcomes via Reliability Engineering in your Platform

Safe serverless deployments with Cloud Run

Two Paths in the Woods

Build Reliable Applications with Kubernetes and Istio

How to get started with SLI/SLO

Rootly Humans of Reliability

Enterprise Roadmap for SRE

Non-standard SLOs: beyond availability

Observability Fundamentals: Open Standards

SRE in Enterprise

How VMs are the Matryoshka doll of compute

Putting SRE Principles in Platform Engineering

Build a Software Delivery Platform with Anthos

Talks and Videos from Steve