Scaling Private LLM Model Services with Kserve and Modelcar OCI: A Real-World Implementation
11-11, 15:10–15:40 (MST), Flex Space

Deploying large language models (LLMs) is inherently complex, challenging, and expensive. This case study demonstrates how Kubernetes, specifically Kserve with Modelcar OCI storage backend, simplifies the deployment and management of private LLM services.
First, we explore how Kserve enables efficient and scalable model serving within a Kubernetes environment, allowing seamless integration and optimized GPU utilization. Second, we delve into how Modelcar OCI artifacts streamline artifact delivery beyond container images, reducing duplicate storage usage, increasing download speeds, and minimizing governance overhead.
The session will cover implementation details, benefits, best practices, and lessons learned.
Walk away learning how to leverage Kubernetes, Kserve, and OCI artifacts to enhance your MLOps journey, achieving significant efficiency gains and overcoming common challenges in deploying and scaling private LLM services.


This session provides a comprehensive case study on leveraging Kserve and Modelcar OCI storage backend to address real-world challenges in scaling AI Inference services. Attendees will gain actionable insights into advanced AI Inferencing deployments, fostering better adoption and implementation within the cloud-native community. This presentation will inspire and equip professionals to enhance their AI platform operations, contributing to the overall growth and evolution of the ecosystem.

Mayuresh Krishna is the CTO and Co-Founder of initializ.ai, where he drives product engineering, building AI models and private AI services. He has previously worked at VMware Tanzu as a Solution Engineering Leader & Pivotal Software as a Senior Platform Architect.