Joe Salisbury Cloud Native Rejekts

Joe Salisbury
.ical

Joe works at Giant Swarm, where he is responsible for helping the team build tools to help the developers that build the tools to help developers.

Session

05-18

15:40

30min

Observing Enterprise Kubernetes Clusters At Scale

Joe Salisbury

Observing Kubernetes clusters at scale can be challenging. While most companies operate a small number of Kubernetes clusters, Giant Swarm is responsible for hundreds. This scale makes maintaining a responsible level of observability harder.

We aim to present our observability journey, particularly with Prometheus.

This will cover our architectural choices in the past, such as building tooling for managing Prometheus for on-demand Kubernetes clusters, our current usage and drawbacks we’d like to address, and our plans for the future, such as horizontal scaling and Cortex.

We will also cover our continuous improvement process using post mortems and continuous delivery, which allows us to evolve our metrics, new exporters, and alerting as we discover blind spots.

This talk presents our learnings of handling observability at scale, with in-depth examples from our infrastructure.

Cloud Native

Main Hall

Joe Salisbury .ical

Session

Joe Salisbury
.ical