Leveraging ML to Improve the Design of Large-Scale Cloud Systems

Abstract: Cloud services are increasingly adopting new programming models, such as microservices and serverless compute. While these frameworks offer several advantages, such as better modularity, ease of maintenance and deployment, they also introduce new hardware and software challenges.  

In this talk, I will briefly describe the challenges that these new cloud models introduce in hardware and software, and discuss how applying ML to cluster management and performance debugging can improve the cloud’s performance predictability and resource efficiency. I will first discuss Seer, a performance debugging system that identifies root causes of unpredictable performance in multi-tier interactive microservices, and Sage, which improves on Seer by taking a completely unsupervised learning approach to data-driven performance debugging, making it both practical and scalable. 

Bio: Christina is an assistant professor in the Electrical and Computer Engineering Department at Cornell where she leads the SAIL group. She is also a member of the Computer Systems Laboratory (CSL), and the John and Norma Balen Sesquicentennial Faculty Fellow. Her main interests are in computer architecture and computer systems. Specifically, she works on improving the resource efficiency of large-scale datacenters through QoS-aware scheduling and resource management techniques. Christina is also interested in designing efficient server architectures, distributed performance debugging, and cloud security. Before joining Cornell, Christina earned a Ph.D. in Electrical Engineering at Stanford University, where she worked with Christos Kozyrakis. She had previously earned an M.S. in Electrical Engineering from Stanford (2011) and a Diploma in Electrical and Computer Engineering from the National Technical University of Athens (2009).