Beyond the Default Scheduler: Navigating GPU Multitenancy in the AI Era
2025-11-08 , Crystal Dining Room

GPU multitenancy in Kubernetes faces significant security challenges when deploying AI workloads on shared infrastructure. Time slicing enables GPU sharing but lacks hardware isolation, risking exposure of sensitive data. NVIDIA Multi-Instance GPU (MIG) provides true hardware isolation with dedicated compute cores, memory slices, and L2 cache partitions, ensuring consistent performance and strict QoS guarantees.

Since the default Kubernetes scheduler cannot partition GPU resources like CPUs for workloads, advanced schedulers—KAI, Volcano, and Kueue can serve as the scheduler for your workloads. They improve GPU sharing through hierarchical queues for secure multi-tenant environments. This talk demonstrates how combining isolation in multi-tenant setups with intelligent scheduling results in optimal utilization, fair resource distribution, and robust security boundaries, guiding the transition from default to GPU-aware scheduling solutions for scalable AI infrastructure.

Shivay Lamba is a software developer specializing in DevOps, Machine Learning and Full Stack Development.

He is an Open Source Enthusiast and has been part of various programs like Google Code In and Google Summer of Code as a Mentor and has also been a MLH Fellow.
He is actively involved in community work as well. He is a TensorflowJS SIG member, Mentor in OpenMined and CNCF Service Mesh Community, SODA Foundation and has given talks at various conferences like Github Satellite, Voice Global, Fossasia Tech Summit, TensorflowJS Show & Tell.

Hrittik is a Platform Advocate at Loft Labs and a CNCF Ambassador, with expertise in cloud native technologies and open source communities. He has contributed extensively to developer advocacy, technical writing, and community engagement. Hrittik has been a featured speaker at events such as Kubernetes Community Days, Open Source Summits, and more, and has served as a Program Committee member for several KubeCons and CloudNativeCons.

This speaker also appears in:

Saiyam is working as Head of DevRel at Loft Labs. He is the founder of Kubesimplify that focuses on simplifying cloud-native and Kubernetes technologies. Previously at Civo, Walmart Labs, Oracle, and HP, Saiyam has worked on many facets of Kubernetes, including machine learning platforms, scaling, multi-cloud, and managed Kubernetes services. He has implemented Kubernetes solutions in various organizations. When not coding, Saiyam contributes to the community by writing blogs and organizing local meetups for Kubernetes and CNCF. He is a Kubestronaut, CNCF TAG Operational Resilience co-chair, runs a YouTube channel, and can be reached on Twitter @saiyampathak