Anne Holler
Anne has an ongoing interest in the intersection of resource efficiency and artificial intelligence.
She worked on Uber's Michelangelo Machine Learning platform, on the management stack for Velocloud's
SD-WAN product, on VMware's Distributed Resource Schedulers for server and storage infrastructure,
on performance analysis for VMware's hypervisor and hosted products, on Omnishift's transparent
application and data delivery over the web to the desktop, on Transmeta's Crusoe processor performance
and power, and on Hewlett-Packard's low-level compiler optimizer. She received bachelors and masters
degrees from Duke University, and a doctorate from University of Virginia, all in Computer Science.
Sessions
Deep Learning (DL) has been successfully applied to many fields, including computer vision, natural language, business, and science. The open-source platforms Ray and Ludwig make DL accessible to diverse users, by reducing the complexity barriers to training, scaling, deploying, and serving DL models. However, DL’s cost and operational overhead present significant challenges. The DL model dev/test/tuning cycle requires intermittent use of substantial GPU resources, which cloud vendors are well-positioned to provide, though at non-trivial prices. Given the expense, managing GPU resources judiciously is critical to the practical use of DL. Nodeless Kubernetes commoditizes compute for Kubernetes clusters. It provisions just-in-time right-sized cost-effective compute for a Kubernetes application when the application starts, and terminates the compute when the application terminates. There are no autoscaling knobs to configure/maintain and no compute shape decisions (e.g., on-demand/spot/CaaS) to be made.
This talk describes running Ray and Ludwig on cloud Kubernetes clusters, using Nodeless K8s as a smart cluster provisioner to add right-sized GPU resources to the K8s cluster when they are needed and to remove them when they are not. Experiments comparing the cost and operational overhead of using Nodeless K8s vs using fixed-size Ray clusters running directly on EC2 show sizable improvements in efficiency and usability, reducing elapsed time by 61%, computing cost by 54%, and idle Ray cluster cost by 66%, while retaining the performance quality of the AutoML results and reducing operational complexity.