Distributed machine learning (ML) systems using parameter servers are prevalently used in industry. With the rapid development of GPU, training performance is usually bottlenecked at network communication for exchanging gradients and parameters.
In this talk, I will share our work on how to alleviate the communication bottleneck and speed up distributed ML training. First I will motivate the problem with measurements on GPU clusters in Azure and EC2. Then I will share the design and implementation of our solution, a system called Stanza that separates the training of different layers in ML models, by exploiting their distinct characteristics. A prototype of Stanza is implemented on PyTorch. Our evaluation on Azure and EC2 shows that Stanza provides 1.25x to 13x speedups over parameter server, for training common CNNs on ImageNet with Nvidia V100 GPUs and 10GbE network.
Hong Xu is an assistant professor in Department of Computer Science, City University of Hong Kong. His research area is computer networking, particularly data center networks and big data systems. He received the B.Eng. degree from The Chinese University of Hong Kong in 2007, and the M.A.Sc. and Ph.D. degrees from University of Toronto in 2009 and 2013, respectively. He was the recipient of an Early Career Scheme Grant from the Hong Kong Research Grants Council in 2014. He received several best paper awards (ACM TURC 2017 (Sigcomm China), IEEE ICNP 2015, etc.). He is a member of ACM and IEEE.
版权所有©湖南大学2017 湖南大学党委宣传部 地址：湖南省长沙市岳麓区麓山南路麓山门 邮编：410082 Email：firstname.lastname@example.org 域名备案信息：[www.hnu.edu.cn,www.hnu.cn/湘ICP备05000239号] [hnu.cn 湘教QS3-200503-000481 hnu.edu.cn 湘教QS4-201312-010059]