David Morrison
drmorr is the founder of ACRL, and is a computer scientist, researcher, and software engineer focused on problems in optimization, scheduling, and distributed systems. He received his PhD from the University of Illinois, Urbana-Champaign in 2014, and has over a decade of industry experience (at companies like Airbnb and Yelp) as well as a strong background in academic research. In his spare time he builds Legos, plays board games, and writes fiction.
Session
AI agents for Kubernetes automation fail because they're trained on unrealistic, simplified scenarios. Unfortunately, there is a dearth of such training data available, as most companies are reticent to publicly share cluster operations data. Moreover, even existing data from, e.g., Google or Alibaba, is not representative of usage patterns seen in smaller organizations. In this talk, we will demonstrate how to use a small “seed” of real, production data from existing Kubernetes clusters to generate a large set of representative, synthetic training data for Kubernetes AI agents. We use graph-theoritic and statistical methods to generate a diverse set of training data covering failure modes, scaling events, resource contention problems, and other common scenarios found in production systems. These techniques, based on research from a team at Harvey Mudd College, allow AI Kubernetes Agents to be trained on high-quality data that is tailored to your company’s production infrastructure.