Publications

Export 8 results:
Filters: Author is Yu Pei  [Clear All Filters]
Conference Paper
Gates, M., J. Kurzak, P. Luszczek, Y. Pei, and J. Dongarra, Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices,” Parallel and Distributed Processing Symposium Workshops (IPDPSW), Orlando, FL, IEEE, June 2017. DOI: 10.1109/IPDPSW.2017.18
Pei, Y., Q. Cao, G. Bosilca, P. Luszczek, V. Eijkhout, and J. Dongarra, Communication Avoiding 2D Stencil Implementations over PaRSEC Task-Based Runtime,” 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), New Orleans, LA, IEEE, May 2020. DOI: 10.1109/IPDPSW50202.2020.00127  (1.33 MB)
Pei, Y., G. Bosilca, I. Yamazaki, A. Ida, and J. Dongarra, Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization,” PAW-ATM Workshop at SC19, Denver, CO, ACM, November 2019.  (4.51 MB)
Cao, Q., Y. Pei, K. Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications,” Platform for Advanced Scientific Computing Conference (PASC20), Geneva, Switzerland, ACM, June 2020. DOI: 10.1145/3394277.3401846  (2.71 MB)
Cao, Q., R. Alomairy, Y. Pei, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, A Framework to Exploit Data Sparsity in Tile Low-Rank Cholesky Factorization,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), July 2022. DOI: 10.1109/IPDPS53621.2022.00047  (1.03 MB)
Luo, X., W. Wu, G. Bosilca, Y. Pei, Q. Cao, T. Patinyasakdikul, D. Zhong, and J. Dongarra, HAN: A Hierarchical AutotuNed Collective Communication Framework,” IEEE Cluster Conference, Kobe, Japan, Best Paper Award, IEEE Computer Society Press, September 2020.  (764.05 KB)
Cao, Q., Y. Pei, K. Akbudak, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems,” 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021), Portland, OR, IEEE, May 2021.  (1.08 MB)
Cao, Q., Y. Pei, T. Herault, K. Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools,” Workshop on Programming and Performance Visualization Tools (ProTools 19) at SC19, Denver, CO, ACM, November 2019.  (429.55 KB)