Publications
Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance,”
International Conference for High Performance Computing Networking, Storage, and Analysis (SC20): ACM, November 2020.
(644.92 KB)
“Hierarchical DAG scheduling for Hybrid Distributed Systems,”
29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.
(1.11 MB)
“HAN: A Hierarchical AutotuNed Collective Communication Framework,”
IEEE Cluster Conference, Kobe, Japan, Best Paper Award, IEEE Computer Society Press, September 2020.
(764.05 KB)
“GPU-Aware Non-contiguous Data Movement In Open MPI,”
25th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'16), Kyoto, Japan, ACM, June 2016.
DOI: http://dx.doi.org/10.1145/2907294.2907317 (482.32 KB)
“Flexible Data Redistribution in a Task-Based Runtime System,”
IEEE International Conference on Cluster Computing (Cluster 2020), Kobe, Japan, IEEE, September 2020.
DOI: 10.1109/CLUSTER49012.2020.00032 (354.8 KB)
“FFT-Based Gradient Sparsification for the Distributed Training of Deep Neural Networks,”
9th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 20), Stockholm, Sweden, ACM, June 2020.
DOI: 10.1145/3369583.3392681 (4.72 MB)
“Efficient Communications in Training Large Scale Neural Networks,”
ACM MultiMedia Workshop 2017, Mountain View, CA, ACM, October 2017.
(1.41 MB)
“ADAPT: An Event-Based Adaptive Collective Communication Framework,”
The 27th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '18), Tempe, Arizona, ACM Press, June 2018.
DOI: 10.1145/3208040.3208054 (493.65 KB)
“