Using Kubernetes to Scale Multi-Class Clusters
2020-03-29, 11:20–11:50, Room 2

As K8s adoption continues to grow, use cases which require more than one cluster instance emerge. Scalability limitations, different cluster configurations (specific capabilities for various purposes), and multi-cloud presence are among these use cases. In general, multiple cluster classes can be envisioned for creating instances, where each class represents a specific cluster configuration on a specific cloud provider.

We discuss how we use an open source project to extend the K8s control plane, so we can provision and scale such multi-class clusters. We first express each class as a set of K8s resource manifests, and then use a claim-based provisioning model for creating an instance of an arbitrary class. Then we show how a custom K8s controller can leverage this claim-based model to scale clusters across each cluster class, adopting a similar approach used in K8s pod auto-scaling.

Benefits to the ecosystem:
- While K8s has a good story around pods auto-scaling, little is done to auto-scale cluster instances when the workload is distributed across multiple clusters. We use the same horizontal pod auto-scaling concepts to scale cluster instances. This enables applying the same pod auto-scaling best practices to scale clusters, w/o introducing new tools or patterns.
- Creating clusters is often done through cloud provider UIs, or running custom scripts or third-party tools. Instead, by expressing cluster classes as K8s resource manifests, users will be able to use the same K8s tools to manage the instances (e.g. kubectl, kustomize, etc.), w/o requiring extra tools.
- Cluster classes are inspired by K8s Storage Class and PVC pattern. This model hits home with the community, and adding new classes will be as easy as applying a new set of resource manifests.
- Workloads which require extraordinary number of nodes can be scheduled with no cluster tweaks. Unlike existing approaches where the node count is increased beyond normal limits to tackle high load (Alibaba), it is not required to have an abnormal node count per cluster.
- Cluster instances can be provisioned on a variety of cloud vendors.

Attendees walk away with a K8s control plane to build a scalable multi-cluster framework with different cluster classes. This talk also opens the door for future work around more complex cluster scheduling, like utilizing cloud vendor metrics for cost and latency optimizations.