Title: A survey of methods for synthesizing collective communication algorithms

Speaker: Shouxu Lin and Ricky Shapley

Abstract: Exploiting parallelism to train machine learning models requires GPUs to collaborate effectively through collective communication to transfer data between GPUs, which becomes a significant bottleneck in training large models. Thus, it is important to design efficient algorithms for collective communication to reduce the overhead in terms of end-to-end latency. However, designing optimal algorithms is challenging because it depends on communication patterns and underlying physical topologies and requires a fairly multi-dimensional space in terms of virtual topologies, mapping of virtual to physical topologies, and very complicated schedules. The community has been exploring various approaches to synthesizing collective communication algorithms. This talk examines key design considerations, evaluates existing synthesis approaches, discusses their advantages and limitations, and outlines unresolved challenges for future research.

Bio: Shouxu Lin is a PhD student working with Prof. Rachit Agarwal and his research interests are in systems and networking for machine learning. Ricky Shapley is a PhD student working with Prof. David Shmoys and his research interests are in designing efficient algorithms that leverage tools from discrete optimization problems for NP-hard and other computationally intractable problems.