Joe Salisbury
Joe works at Giant Swarm, where he is responsible for helping the team build tools to help the developers that build the tools to help developers.
Sessions
Observing Kubernetes clusters at scale can be challenging. While most companies operate a small number of Kubernetes clusters, Giant Swarm is responsible for hundreds. This scale makes maintaining a responsible level of observability harder.
We aim to present our observability journey, particularly with Prometheus.
This will cover our architectural choices in the past, such as building tooling for managing Prometheus for on-demand Kubernetes clusters, our current usage and drawbacks we’d like to address, and our plans for the future, such as horizontal scaling and Cortex.
We will also cover our continuous improvement process using post mortems and continuous delivery, which allows us to evolve our metrics, new exporters, and alerting as we discover blind spots.
This talk presents our learnings of handling observability at scale, with in-depth examples from our infrastructure.