2025-11-08 –, Crystal Dining Room
GPU multitenancy in Kubernetes faces significant security challenges when deploying AI workloads on shared infrastructure. Time slicing enables GPU sharing but lacks hardware isolation, risking exposure of sensitive data. NVIDIA Multi-Instance GPU (MIG) provides true hardware isolation with dedicated compute cores, memory slices, and L2 cache partitions, ensuring consistent performance and strict QoS guarantees.
Since the default Kubernetes scheduler cannot partition GPU resources like CPUs for workloads, advanced schedulers—KAI, Volcano, and Kueue can serve as the scheduler for your workloads. They improve GPU sharing through hierarchical queues for secure multi-tenant environments. This talk demonstrates how combining isolation in multi-tenant setups with intelligent scheduling results in optimal utilization, fair resource distribution, and robust security boundaries, guiding the transition from default to GPU-aware scheduling solutions for scalable AI infrastructure.
Creating multi-tenant environments while using GPUs optimally represents how new infrastructure moves, making scheduler transitions a necessary approach for competitive AI platform deployment. As part of the cloud native university track, this talk provides foundational knowledge on an often overlooked subject: how scheduling is evolving for workloads everywhere.
Key Learning Outcomes
Hardware-Scheduler Alignment: Master decision matrix for selecting scheduling solutions based on hardware. Learn when to use time slicing vs MIG, and how GPU architectures influence scheduler selection.
Scheduling Strategy Selection: Evaluate Kubernetes default, KAI, Volcano, and Kueue based on workload patterns, security requirements, and resource constraints for optimal performance.
What Makes This Session Unique
Implementation-Focused: Demonstrates practical multi-tenant GPU deployment patterns, configurations, and troubleshooting for production AI platforms beyond theoretical concepts.
Compliance & Cost: Addresses regulatory compliance intersection with cost management, showing how scheduler selection achieves security isolation and maximizes hardware ROI.
Inference Best Practices: Covers complete AI lifecycle including pre-training, fine-tuning, and inference workloads, showing how scheduling impacts latency and throughput.
Speaker Expertise
Hrittik and Shivay bring extensive hands-on experience running diverse AI workloads across multiple schedulers, from interactive notebooks to large-scale distributed training jobs.
Target Audience
AI Researchers: Data scientists needing infrastructure knowledge for model development workflows and resource access patterns communication with platform teams.
Infrastructure Engineers: Platform engineers designing GPU clusters supporting diverse AI workloads, requiring scheduler capabilities and multi-tenant security understanding.
Platform Engineers: DevOps/SRE professionals managing Kubernetes GPU environments, needing scheduler deployment, configuration, monitoring, and troubleshooting knowledge.
Shivay Lamba is a software developer specializing in DevOps, Machine Learning and Full Stack Development.
He is an Open Source Enthusiast and has been part of various programs like Google Code In and Google Summer of Code as a Mentor and has also been a MLH Fellow.
He is actively involved in community work as well. He is a TensorflowJS SIG member, Mentor in OpenMined and CNCF Service Mesh Community, SODA Foundation and has given talks at various conferences like Github Satellite, Voice Global, Fossasia Tech Summit, TensorflowJS Show & Tell.
Hrittik is a Platform Advocate at Loft Labs and a CNCF Ambassador, with expertise in cloud native technologies and open source communities. He has contributed extensively to developer advocacy, technical writing, and community engagement. Hrittik has been a featured speaker at events such as Kubernetes Community Days, Open Source Summits, and more, and has served as a Program Committee member for several KubeCons and CloudNativeCons.