SLOs with Prometheus done wrong, wrong, wrong, right
11-05, 16:30–17:00 (US/Central), ROOM 1

This session is an example of a practical use of recording rules to deal with high volumes of data or abstracting complex queries. It illustrates the power of the feature and also shows some common pitfalls. It is a session focused on showing how to get the most value out of a Prometheus installation. I hope to give people confidence to try to do new things by being very honest about our failures.


First thing's first: Yes, it really did take us 4 tries to implement our SLOs with Prometheus. While that may seem embarrassing, we are very happy to be able to share our SLO journey so that we can hopefully help you avoid the same mistakes.

So why did it take us 4 tries? In a word: Scale. We needed to handle 28 days worth of data for over 400 microservices and still have responsive dashboards and alerts. Luckily, Prometheus provides us with some amazing features to deal with large or slow queries. Unfortunately many of our first attempts met with serious failures when we misunderstood and misused those features.

This session is here to walk you through all 4 phases our or SLO rollout. By the end you we hope to help you see the how to get the most value out of Prometheus while also illustrating some common pitfalls and how to avoid them.

DevX-O - Weave

Carson is a co-organizer of the Utah Gophers meetup and a fixture at the Utah Kubernetes meetup. He may be best known for his "Kubernetes Deconstructed" presentation but he has been speaking at conferences constantly ever since. He currently works at Weave on the Developer Experience team as the DevX-O (a made up title). As part of this team, he works constantly to improve the quality of life for developers at Weave.