PagerDuty's Nomadic Journey

Adopting a container scheduler is a daunting task, with its as-yet unknown failure modes, quirky preferences, and opinionated UX. How will we know if it stops behaving? What even is normal behaviour?

How will developers interact with it? How will it integrate with our existing monitoring tools and network, and how long will it take to figure these things out? These uncertainties spell risk for the critical production services that will be running on it and for the people who will be interacting with it day-to-day.

In this talk I will tell you PagerDuty's story of adopting Nomad: what we did, what went well, what went poorly, how long it took us, and areas for improvement in Nomad's usage story so that you can have a better idea of what is involved in adopting Nomad as your container scheduler.