Talks and Videos from Steve
Here you’ll find a collection of video presentations I’ve given at various conferences and meetups.

How to get started with SLI/SLO
Steve McGhee shares his expertise on SLI/SLO implementation at Google
Here you’ll find a collection of video presentations I’ve given at various conferences and meetups.
Steve McGhee shares his expertise on SLI/SLO implementation at Google
Infrastructure as code An introduction to managing infrastructure through code. Abstract This talk provides a comprehensive introduction to Infrastructure as Code (IaC), covering fundamental concepts, best practices, and practical approaches to getting started with managing infrastructure programmatically. Resources Video Recording
Safe serverless deployments with Cloud Run Learn how to implement safe deployment practices in serverless environments. Abstract This presentation covers best practices and strategies for implementing safe deployment patterns in serverless environments using Cloud Run, ensuring reliability and minimizing risk during deployments. Resources Video Recording
Build Reliable Applications with Kubernetes and Istio A deep dive into building reliable applications using Kubernetes and Istio. Abstract Steve McGhee and Ameer Abbas explore the intersection of reliability engineering with Kubernetes and Istio, demonstrating how these technologies can be leveraged to build and maintain reliable applications at scale. ...
Archetypes for Reliable Systems Understanding common patterns and models for building reliable cloud services. Abstract In this presentation, Steve McGhee and Ameer Abbas introduce a comprehensive model for designing and operating reliable cloud services, exploring various system archetypes and their implications for reliability. Resources Video Recording
How to get started with SLI/SLO A practical guide to implementing Service Level Indicators and Objectives. Abstract This talk provides a comprehensive introduction to Service Level Indicators (SLIs) and Service Level Objectives (SLOs), drawing from Steve McGhee’s experience implementing these practices at Google. Resources Video Recording
Putting SRE Principles in Platform Engineering Applying SRE principles to platform engineering practices. Abstract This talk explores how to effectively apply Site Reliability Engineering (SRE) principles to platform engineering. We discuss practical approaches to building reliable platforms and how SRE methodologies can enhance platform operations and development. ...
Observability Fundamentals: Open Standards A lightning talk on the fundamentals of observability and open standards. Abstract In this lightning talk, Steve McGhee covers the essential concepts of observability and the importance of open standards in building observable systems. Resources Video Recording
Non-standard SLOs: beyond availability Exploring service level objectives beyond traditional availability metrics. Abstract This talk delves into the world of non-standard Service Level Objectives (SLOs), moving beyond traditional availability metrics to explore more nuanced ways of measuring and ensuring service reliability. Resources Video Recording
Rootly Humans of Reliability A discussion about reliability engineering and its impact on modern systems. Abstract In this segment with Rootly, Steve McGhee shares insights about his role in reliability engineering and discusses the importance of human factors in building and maintaining reliable systems. Resources Video Recording
Two Paths in the Woods Exploring different approaches to teaching and implementing SRE practices. Abstract While the direction and intent of SRE has been established and is becoming better-understood, the details on how to achieve “SRE” is still an exercise left for the reader. Steve from Google and Sal from Nobl9 present two independently developed methods for teaching SRE topics to customers, which we have discovered are actually quite similar. Huzzah! ...