Publications

Export 192 results:
Filters: Author is Piotr Luszczek  [Clear All Filters]
2017
Fayad, D., J. Kurzak, P. Luszczek, P. Wu, and J. Dongarra, The Case for Directive Programming for Accelerator Autotuner Optimization,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-07: University of Tennessee, October 2017.  (341.52 KB)
Yamazaki, I., M. Hoemmen, P. Luszczek, and J. Dongarra, Comparing performance of s-step and pipelined GMRES on distributed-memory multicore CPUs , Pittsburgh, Pennsylvania, SIAM Annual Meeting, July 2017.  (748 KB)
Kurzak, J., P. Luszczek, I. Yamazaki, Y. Robert, and J. Dongarra, Design and Implementation of the PULSAR Programming System for Large Scale Computing,” Supercomputing Frontiers and Innovations, vol. 4, issue 1, 2017. DOI: 10.14529/jsfi170101  (764.96 KB)
Kurzak, J., P. Wu, M. Gates, I. Yamazaki, P. Luszczek, G. Ragghianti, and J. Dongarra, Designing SLATE: Software for Linear Algebra Targeting Exascale,” SLATE Working Notes, no. 03, ICL-UT-17-06: Innovative Computing Laboratory, University of Tennessee, October 2017.  (2.8 MB)
Yamazaki, I., M. Hoemmen, P. Luszczek, and J. Dongarra, Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives,” Proceedings of The 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017), Best Paper Award, Orlando, FL, June 2017. DOI: 10.1109/IPDPSW.2017.65  (453.66 KB)
Anzt, H., E. Boman, J. Dongarra, G. Flegar, M. Gates, M. Heroux, M. Hoemmen, J. Kurzak, P. Luszczek, S. Rajamanickam, et al., MAGMA-sparse Interface Design Whitepaper,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-05, September 2017.  (1.28 MB)
Abalenkovs, M., N. Bagherpour, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Relton, J. Sistek, D. Stevens, et al., PLASMA 17 Performance Report,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-11: University of Tennessee, June 2017.  (7.57 MB)
Abalenkovs, M., N. Bagherpour, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Relton, J. Sistek, D. Stevens, et al., PLASMA 17.1 Functionality Report,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-10: University of Tennessee, June 2017.  (1.8 MB)
Abdelfattah, A., H. Anzt, A. Bouteiller, A. Danalis, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, et al., Roadmap for the Development of a Linear Algebra Library for Exascale Computing: SLATE: Software for Linear Algebra Targeting Exascale,” SLATE Working Notes, no. 01, ICL-UT-17-02: Innovative Computing Laboratory, University of Tennessee, June 2017.  (2.8 MB)
Luszczek, P., J. Kurzak, I. Yamazaki, D. Keffer, and J. Dongarra, Scaling Point Set Registration in 3D Across Thread Counts on Multicore and Hardware Accelerator Platforms through Autotuning for Large Scale Analysis of Scientific Point Clouds,” IEEE International Workshop on Benchmarking, Performance Tuning and Optimization for Big Data Applications (BPOD 2017), Boston, MA, IEEE, December 2017. DOI: 10.1109/BigData.2017.8258258  (6.71 MB)
Luszczek, P., J. Kurzak, I. Yamazaki, and J. Dongarra, Towards Numerical Benchmark for Half-Precision Floating Point Arithmetic,” 2017 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, IEEE, September 2017. DOI: 10.1109/HPEC.2017.8091031  (1.67 MB)
Dongarra, J., S. Tomov, P. Luszczek, J. Kurzak, M. Gates, I. Yamazaki, H. Anzt, A. Haidar, and A. Abdelfattah, With Extreme Computing, the Rules Have Changed,” Computing in Science & Engineering, vol. 19, issue 3, pp. 52-62, May 2017. DOI: 10.1109/MCSE.2017.48  (485.34 KB)
2016
Jia, Y., P. Luszczek, and J. Dongarra, Hessenberg Reduction with Transient Error Resilience on GPU-Based Hybrid Architectures,” 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.  (535.72 KB)
Newburn, C. J., G. Bansal, M. Wood, L. Crivelli, J. Planas, A. Duran, P. Souza, L. Borges, P. Luszczek, S. Tomov, et al., Heterogeneous Streaming,” The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2016, Chicago, IL, IEEE, May 2016.  (2.73 MB)
Dongarra, J., M. A. Heroux, and P. Luszczek, High Performance Conjugate Gradient Benchmark: A new Metric for Ranking High Performance Computing Systems,” International Journal of High Performance Computing Applications, vol. 30, issue 1, pp. 3 - 10, February 2016. DOI: 10.1177/1094342015593158  (277.51 KB)
Abdelfattah, A., H. Anzt, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, , and A. YarKhan, Linear Algebra Software for Large-Scale Accelerated Multicore Computing,” Acta Numerica, vol. 25, pp. 1-160, May 2016. DOI: 10.1017/S0962492916000015
Dong, T., A. Haidar, P. Luszczek, S. Tomov, A. Abdelfattah, and J. Dongarra, MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs,” Innovative Computing Laboratory Technical Report, no. ICL-UT-16-02: University of Tennessee, August 2016.  (929.79 KB)
Dongarra, J., M. A. Heroux, and P. Luszczek, A New Metric for Ranking High-Performance Computing Systems,” National Science Review, vol. 3, issue 1, pp. 30-35, January 2016. DOI: 10.1093/nsr/nwv084  (393.55 KB)
YarKhan, A., J. Kurzak, P. Luszczek, and J. Dongarra, Porting the PLASMA Numerical Library to the OpenMP Standard,” International Journal of Parallel Programming, June 2016. DOI: 10.1007/s10766-016-0441-6  (1.66 MB)
Luszczek, P., M. Gates, J. Kurzak, A. Danalis, and J. Dongarra, Search Space Generation and Pruning System for Autotuners,” 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.  (555.44 KB)
2015
Anzt, H., W. Sawyer, S. Tomov, P. Luszczek, and J. Dongarra, Acceleration of GPU-based Krylov solvers via Data Transfer Reduction,” International Journal of High Performance Computing Applications, 2015.
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, Batched Matrix Computations on Hardware Accelerators,” EuroMPI/Asia 2015 Workshop, Bordeaux, France, September 2015.  (589.05 KB)
Haidar, A., T. Dong, P. Luszczek, S. Tomov, and J. Dongarra, Batched matrix computations on hardware accelerators based on GPUs,” International Journal of High Performance Computing Applications, February 2015. DOI: 10.1177/1094342014567546  (2.16 MB)
YarKhan, A., A. Haidar, C. Cao, P. Luszczek, S. Tomov, and J. Dongarra, Cholesky Across Accelerators,” 17th IEEE International Conference on High Performance Computing and Communications (HPCC 2015), Elizabeth, NJ, IEEE, August 2015.
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, Efficient Eigensolver Algorithms on Accelerator Based Architectures,” 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.  (6.98 MB)
Anzt, H., B. Haugen, J. Kurzak, P. Luszczek, and J. Dongarra, Experiences in autotuning matrix multiplication for energy minimization on GPUs,” Concurrency in Computation: Practice and Experience, vol. 27, issue 17, pp. 5096-5113, December 2015. DOI: 10.1002/cpe.3516  (1.98 MB)
Anzt, H., B. Haugen, J. Kurzak, P. Luszczek, and J. Dongarra, Experiences in Autotuning Matrix Multiplication for Energy Minimization on GPUs,” Concurrency and Computation: Practice and Experience, vol. 27, issue 17, pp. 5096 - 5113, Oct 12, 2015. DOI: 10.1002/cpe.3516  (1.99 MB)
Haidar, A., A. YarKhan, C. Cao, P. Luszczek, S. Tomov, and J. Dongarra, Flexible Linear Algebra Development and Scheduling with Cholesky Factorization,” 17th IEEE International Conference on High Performance Computing and Communications, Newark, NJ, August 2015.  (494.31 KB)
Haidar, A., T. Dong, S. Tomov, P. Luszczek, and J. Dongarra, Framework for Batched and GPU-resident Factorization Algorithms to Block Householder Transformations,” ISC High Performance, Frankfurt, Germany, Springer, July 2015.  (778.26 KB)
Dongarra, J., M. A. Heroux, and P. Luszczek, High-Performance Conjugate-Gradient Benchmark: A New Metric for Ranking High-Performance Computing Systems,” The International Journal of High Performance Computing Applications, 2015. DOI: 10.1177/1094342015593158  (336.19 KB)
Haidar, A., J. Dongarra, K. Kabir, M. Gates, P. Luszczek, S. Tomov, and Y. Jia, HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi,” Scientific Programming, vol. 23, issue 1, January 2015. DOI: 10.3233/SPR-140404  (553.94 KB)
Dongarra, J., M. A. Heroux, and P. Luszczek, HPCG Benchmark: a New Metric for Ranking High Performance Computing Systems,” University of Tennessee Computer Science Technical Report , no. ut-eecs-15-736: University of Tennessee, January 2015.
Haidar, A., S. Tomov, P. Luszczek, and J. Dongarra, MAGMA Embedded: Towards a Dense Linear Algebra Library for Energy Efficient Extreme Computing,” 2015 IEEE High Performance Extreme Computing Conference (HPEC ’15), (Best Paper Award), Waltham, MA, IEEE, September 2015.  (678.86 KB)
Anzt, H., J. Dongarra, M. Gates, A. Haidar, K. Kabir, P. Luszczek, S. Tomov, and I. Yamazaki, MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi , Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.  (2.03 MB)
Haidar, A., T. Dong, P. Luszczek, S. Tomov, and J. Dongarra, Optimization for Performance and Energy for Batched Matrix Computations on GPUs,” 8th Workshop on General Purpose Processing Using GPUs (GPGPU 8), San Francisco, CA, ACM, February 2015. DOI: 10.1145/2716282.2716288  (699.5 KB)
Abalenkovs, M., A. Abdelfattah, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I. Yamazaki, and A. YarKhan, Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems,” Supercomputing Frontiers and Innovations, vol. 2, no. 4, October 2015. DOI: 10.14529/jsfi1504  (3.68 MB)
Mary, T., I. Yamazaki, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra, Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.
Yamazaki, I., J. Kurzak, P. Luszczek, and J. Dongarra, Randomized Algorithms to Update Partial Singular Value Decomposition on a Hybrid CPU/GPU Cluster,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.
Donfack, S., J. Dongarra, M. Faverge, M. Gates, J. Kurzak, P. Luszczek, and I. Yamazaki, A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination,” Concurrency and Computation: Practice and Experience, vol. 27, issue 5, pp. 1292-1309, April 2015. DOI: 10.1002/cpe.3306  (783.45 KB)
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, Towards Batched Linear Solvers on Accelerated Hardware Platforms,” 8th Workshop on General Purpose Processing Using GPUs (GPGPU 8) co-located with PPOPP 2015, San Francisco, CA, ACM, February 2015.  (403.74 KB)
Haidar, A., Y. Jia, P. Luszczek, S. Tomov, A. YarKhan, and J. Dongarra, Weighted Dynamic Scheduling with Many Parallelism Grains for Offloading of Numerical Workloads to Multiple Varied Accelerators,” Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA'15), vol. No. 5, Austin, TX, ACM, November 2015.  (347.6 KB)
2014
Dongarra, J., M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and I. Yamazaki, Accelerating Numerical Dense Linear Algebra Calculations with GPUs,” Numerical Computations with GPUs: Springer International Publishing, pp. 3-28, 2014. DOI: 10.1007/978-3-319-06548-9_1  (1.06 MB)
Dongarra, J., M. Faverge, H. Ltaeif, and P. Luszczek, Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting,” Concurrency and Computation: Practice and Experience, vol. 26, issue 7, pp. 1408-1431, May 2014. DOI: 10.1002/cpe.3110  (1.96 MB)
Cao, C., J. Dongarra, P. Du, M. Gates, P. Luszczek, and S. Tomov, clMAGMA: High Performance Dense Linear Algebra with OpenCL ,” International Workshop on OpenCL, Bristol University, England, May 2014.  (460.91 KB)
Yamazaki, I., J. Kurzak, P. Luszczek, and J. Dongarra, Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime,” Workshop on Large-Scale Parallel Processing, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (398.16 KB)
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, Heterogeneous Acceleration for Linear Algebra in Mulit-Coprocessor Environments,” VECPAR 2014, Eugene, OR, June 2014.  (276.52 KB)
Luszczek, P., J. Kurzak, and J. Dongarra, Looking Back at Dense Linear Algebra Software,” Journal of Parallel and Distributed Computing, vol. 74, issue 7, pp. 2548–2560, July 2014. DOI: 10.1016/j.jpdc.2013.10.005  (1.79 MB)
Dong, T., A. Haidar, P. Luszczek, J. Harris, S. Tomov, and J. Dongarra, LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU,” 16th IEEE International Conference on High Performance Computing and Communications (HPCC), Paris, France, IEEE, August 2014.  (684.73 KB)
Dongarra, J., A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and A. YarKhan, Model-Driven One-Sided Factorizations on Multicore, Accelerated Systems,” Supercomputing Frontiers and Innovations, vol. 1, issue 1, 2014. DOI: http://dx.doi.org/10.14529/jsfi1401  (1.86 MB)
Haidar, A., P. Luszczek, and J. Dongarra, New Algorithm for Computing Eigenvectors of the Symmetric Eigenvalue Problem,” Workshop on Parallel and Distributed Scientific and Engineering Computing, IPDPS 2014 (Best Paper), Phoenix, AZ, IEEE, May 2014. DOI: 10.1109/IPDPSW.2014.130  (2.33 MB)

Pages