GPUDirect MPI Communications and Optimizations to Accelerate FFTs on Exascale Systems,” EuroMPI'19 Posters, Zurich, Switzerland, no. icl-ut-19-06: ICL, September 2019.“
Ginkgo: A Node-Level Sparse Linear Algebra Library for HPC (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
Gingko: A Sparse Linear Algebrea Library for HPC : 2021 ECP Annual Meeting, April 2021.
FFT-ECP Fast Fourier Transform , Houston, TX, 2019 ECP Annual Meeting (Research Poster), January 2019.
Extending MAGMA Portability with OneAPI , Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), ACM Student Research Competition, November 2022.
Exa-PAPI: The Exascale Performance API with Modern C++ , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
Enhancing the Performance of Dense Linear Algebra Solvers on GPUs (in the MAGMA Project) , Austin, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC08), November 2008.
DTE: PaRSEC Systems and Interfaces (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
DTE: PaRSEC Enabled Libraries and Applications (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
DTE: PaRSEC Enabled Libraries and Applications : 2021 Exascale Computing Project Annual Meeting, April 2021.
Clover: Computational Libraries Optimized via Exascale Research , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
Cholesky Factorization on Batches of Matrices with Fixed and Variable Sizes , San Jose, CA, GPU Technology Conference (GTC16), Poster, April 2016.
Acceleration of the BLAST Hydro Code on GPU,” Supercomputing '12 (poster), Salt Lake City, Utah, SC12, November 2012.“
Accelerating Tensor Contractions for High-Order FEM on CPUs, GPUs, and KNLs , Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC16), Poster, September 2016.
Accelerating FFT towards Exascale Computing : NVIDIA GPU Technology Conference (GTC2021), 2021.
Accelerating 2D FFT: Exploit GPU Tensor Cores through Mixed-Precision , Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), ACM Student Research Poster, November 2018.
The HPL Benchmark: Past, Present & Future , ISC High Performance, Frankfurt, Germany, July 2016.
hipMAGMA v2.0 : Zenodo, July 2020.
hipMAGMA v1.0 : Zenodo, March 2020.
Exascale Computing and Big Data,” Communications of the ACM, vol. 58, no. 7: ACM, pp. 56-68, July 2015.“
Earth Virtualization Engines - A Technical Perspective , September 2023.
With Extreme Computing, the Rules Have Changed,” Computing in Science & Engineering, vol. 19, issue 3, pp. 52-62, May 2017.“
Weighted Block-Asynchronous Relaxation for GPU-Accelerated Systems,” SIAM Journal on Computing (submitted), March 2012.“
VisPerf: Monitoring Tool for Grid Computing,” Lecture Notes in Computer Science, vol. 2659: Springer Verlag, Heidelberg, pp. 233-243, 00 2003.“
The Virtual Instrument: Support for Grid-enabled Scientific Simulations,” Journal of Parallel and Distributed Computing (submitted), October 2002.“
The Virtual Instrument: Support for Grid-enabled Scientific Simulations,” International Journal of High Performance Computing Applications, vol. 18, no. 1, pp. 3-17, January 2004.“
Variable-Size Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioning on Graphics Processors,” Parallel Computing, vol. 81, pp. 131-146, January 2019.“
Using multiple levels of parallelism to enhance the performance of domain decomposition solvers,” Parallel Computing, vol. 36, no. 5-6: Elsevier journals, pp. 285-296, 00 2010.“
Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy,” ACM Transactions on Mathematical Software, vol. 34, no. 4, pp. 17-22, 00 2008.“
Using MAGMA with PGI Fortran,” PGI Insider, November 2010.“
Using long vector extensions for MPI reductions,” Parallel Computing, vol. 109, pp. 102871, March 2022.“
Using Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning,” Journal of Parallel and Distributed Computing, vol. 119, pp. 219–230, November 2018.“
Using Ginkgo's memory accessor for improving the accuracy of memory‐bound low precision BLAS,” Software: Practice and Experience, vol. 532, issue 1, pp. 81 - 98, January Jan.“
User-Defined Events for Hardware Performance Monitoring,” Procedia Computer Science, vol. 4: Elsevier, pp. 2096-2104, May 2011.“
The use of bulk states to accelerate the band edge state calculation of a semiconductor quantum dot,” Journal of Computational Physics (submitted), January 2006.“
The Use of Bulk States to Accelerate the Band Edge State Calculation of a Semiconductor Quantum Dot,” Journal of Computational Physics, vol. 223, pp. 774-782, 00 2007.“
Updating Incomplete Factorization Preconditioners for Model Order Reduction,” Numerical Algorithms, vol. 73, issue 3, no. 3, pp. 611–630, February 2016.“
An Updated Set of Basic Linear Algebra Subprograms (BLAS),” ACM Transactions on Mathematical Software, vol. 28, no. 2, pp. 135-151, December 2002.“
Unveiling the Performance-energy Trade-off in Iterative Linear System Solvers for Multithreaded Processors,” Concurrency and Computation: Practice and Experience, vol. 27, issue 4, pp. 885-904, September 2014.“
Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,” Concurrency and Computation: Practice and Experience, November 2013.“
Tuning Principal Component Analysis for GRASS GIS on Multi-core and GPU Architectures,” FOSS4G 2010, Barcelona, Spain, September 2010.“
Truss Structural Optimization Using NetSolve System,” Meeting of the Japan Society of Mechanical Engineers, Kyoto University, Kyoto, Japan, October 2002.“
Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems,” Concurrency and Computation: Practice and Experience, October 2013.“
A Tribute to Gene Golub,” Computing in Science and Engineering: IEEE, pp. 5, January 2008.“
Trends in High Performance Computing,” The Computer Journal, vol. 47, no. 4: The British Computer Society, pp. 399-403, 00 2004.“
Transparent Cross-Platform Access to Software Services using GridSolve and GridRPC,” in Cloud Computing and Software Services: Theory and Techniques (to appear): CRC Press, 00 2009.“
Translational Process: Mathematical Software Perspective,” Journal of Computational Science, September 2020.“
Translational process: Mathematical software perspective,” Journal of Computational Science, vol. 52, pp. 101216, 2021.“
Trace-based Performance Analysis for the Petascale Simulation Code FLASH,” International Journal of High Performance Computing Applications (to appear), 00 2010.“
Towards Optimal Multi-Level Checkpointing,” IEEE Transactions on Computers, vol. 66, issue 7, pp. 1212–1226, July 2017.“