Building a SLM Platform with Karpenter, Ray Server and Ollama
2025-03-30 , The Waterloo

In today's enterprise landscape, organizations struggle with deploying AI infrastructure at scale, facing challenges in resource optimization and cost management. This presentation introduces a Small Language Model (SLM) platform combining Karpenter, Ray Server and Ollama on Kubernetes to address these challenges. We'll showcase how to achieves up to 20% cost reduction in GPU utilization through dynamic resource allocation and efficient workload distribution. The unified management layer simplifies model versioning, monitoring, while handling concurrent model deployments, demand spikes and ensuring consistent performance with built-in audit capabilities for compliance.

Pedro Henrique is a ISV Solution Architect at AWS serving clients on their containers and open source journey. Pedro has experience working with distribuited systems, modernization, cloud native application, GitOps and platform Engineering. He also contributes to open source projects such as twelve factors and has spoken about CNCF projects at CNCF KCD and third-party community events.

This speaker also appears in: