Zero downtime upgrades of Kubernetes
05-19, 17:40–17:45 (UTC), Main Hall

The Kubernetes project releases a new version every 3 month as well as several bug fix releases in between. You need and want to upgrade your clusters. How do you do that with zero-downtime and no impact on your production workloads? In this lightning talk I will show how my team has come up with a procedure to upgrade a cluster and monitor the upgrade itself. In particular to avoid impact due to nodes becoming "Not Ready".


The team at Meltwater develops and provides Kubernetes as a Service in a multi-tenant setup internally for 40+ development teams. The base cluster is deployed with Kops and then enhanced with add-ons by the team. During our first "kops rolling-update" runs we have experienced nodes going into "Not ready". Mainly do to bug #48638/#41916 and the way nodes reach the masters through DNS round-robin. Though our way of doing the upgrade was "triggered" by kops, we think the process and its steps, in particular the way we monitor the upgrade itself and the API functionality during the upgrade, could be interesting and applied to any Kubernetes instance.

I have worked with large scale distributed systems for the last 10+ years, from online gaming to data intensive applications. In the last couple years I have been focusing on building a Kubernetes platform to accelerate the development teams in Meltwater. In my spare time, while not riding my Ducati on a race track I practice the fine art of tsundoku.