Reporting on Reliability - Improving Stakeholder Conversations

Abstract

Reporting on Reliability - Improving stakeholder conversations is a best practices presentation for how to communicate about reliability: during incidents, immediately after, and in periodic planning sessions. “Phew! That incident is resolved. Now let’s never speak of it again.” Tempting, right?

After a production outage, rehashing it is often the last thing we want to do, especially since the conversation is likely to be a tense “face the music” situation full of finger pointing and apologies. As an alternative, reliability engineers have long practiced blameless retrospectives, having discovered that this is the best way to learn from the past, lest we repeat it. Unfortunately, stakeholders outside the operations team may not be so forgiving. But as organizations increasingly realize the importance of reliability to their users and their bottom line, we find an opportunity to engage our colleagues in developing a more mature, sustainable approach to discussing reliability—and the occasional lack thereof. By normalizing incidents, communicating systematically, and aligning on goals and investments, we mature from an avoidant, reactive approach toward a collaborative, strategic one.

Details

  • Speaker: Ramon Medrano Llamas
  • Slides Authored by: Steve McGhee
  • Event: DevOpsDays Zurich 2023
  • Location: Zurich, Switzerland
  • Link: DevOpsDays Zurich 2023 Program