Publications
DTE: PaRSEC Enabled Libraries and Applications (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(979.27 KB)
DTE: PaRSEC Enabled Libraries and Applications
: 2021 Exascale Computing Project Annual Meeting, April 2021.
(3.24 MB)
Clover: Computational Libraries Optimized via Exascale Research
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(872 KB)
Cholesky Factorization on Batches of Matrices with Fixed and Variable Sizes
, San Jose, CA, GPU Technology Conference (GTC16), Poster, April 2016.
(480.51 KB)
Acceleration of the BLAST Hydro Code on GPU,”
Supercomputing '12 (poster), Salt Lake City, Utah, SC12, November 2012.
“Accelerating Tensor Contractions for High-Order FEM on CPUs, GPUs, and KNLs
, Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC16), Poster, September 2016.
(4.29 MB)
Accelerating FFT towards Exascale Computing
: NVIDIA GPU Technology Conference (GTC2021), 2021.
(27.23 MB)
With Extreme Computing, the Rules Have Changed,”
Computing in Science & Engineering, vol. 19, issue 3, pp. 52-62, May 2017.
DOI: 10.1109/MCSE.2017.48 (485.34 KB)
“Weighted Block-Asynchronous Relaxation for GPU-Accelerated Systems,”
SIAM Journal on Computing (submitted), March 2012.
(811.01 KB)
“VisPerf: Monitoring Tool for Grid Computing,”
Lecture Notes in Computer Science, vol. 2659: Springer Verlag, Heidelberg, pp. 233-243, 00 2003.
(835.09 KB)
“The Virtual Instrument: Support for Grid-enabled Scientific Simulations,”
International Journal of High Performance Computing Applications, vol. 18, no. 1, pp. 3-17, January 2004.
(282.16 KB)
“The Virtual Instrument: Support for Grid-enabled Scientific Simulations,”
Journal of Parallel and Distributed Computing (submitted), October 2002.
(282.16 KB)
“Variable-Size Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioning on Graphics Processors,”
Parallel Computing, vol. 81, pp. 131-146, January 2019.
DOI: 10.1016/j.parco.2017.12.006 (1.9 MB)
“Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy,”
ACM Transactions on Mathematical Software, vol. 34, no. 4, pp. 17-22, 00 2008.
(364.48 KB)
“Using MAGMA with PGI Fortran,”
PGI Insider, November 2010.
(176.67 KB)
“Using Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning,”
Journal of Parallel and Distributed Computing, vol. 119, pp. 219–230, November 2018.
DOI: 10.1016/j.jpdc.2018.04.017 (273.53 KB)
“The Use of Bulk States to Accelerate the Band Edge State Calculation of a Semiconductor Quantum Dot,”
Journal of Computational Physics, vol. 223, pp. 774-782, 00 2007.
(452.6 KB)
“The use of bulk states to accelerate the band edge state calculation of a semiconductor quantum dot,”
Journal of Computational Physics (submitted), January 2006.
(337.08 KB)
“Updating Incomplete Factorization Preconditioners for Model Order Reduction,”
Numerical Algorithms, vol. 73, issue 3, no. 3, pp. 611–630, February 2016.
DOI: 10.1007/s11075-016-0110-2 (565.34 KB)
“An Updated Set of Basic Linear Algebra Subprograms (BLAS),”
ACM Transactions on Mathematical Software, vol. 28, no. 2, pp. 135-151, December 2002.
DOI: 10.1145/567806.567807 (228.33 KB)
“Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,”
Concurrency and Computation: Practice and Experience, November 2013.
DOI: 10.1002/cpe.3173 (894.61 KB)
“Tuning Principal Component Analysis for GRASS GIS on Multi-core and GPU Architectures,”
FOSS4G 2010, Barcelona, Spain, September 2010.
(1.57 MB)
“Truss Structural Optimization Using NetSolve System,”
Meeting of the Japan Society of Mechanical Engineers, Kyoto University, Kyoto, Japan, October 2002.
(450.65 KB)
“Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems,”
Concurrency and Computation: Practice and Experience, October 2013.
(1.71 MB)
“A Tribute to Gene Golub,”
Computing in Science and Engineering: IEEE, pp. 5, January 2008.
“Trends in High Performance Computing,”
The Computer Journal, vol. 47, no. 4: The British Computer Society, pp. 399-403, 00 2004.
(455.96 KB)
“Transparent Cross-Platform Access to Software Services using GridSolve and GridRPC,”
in Cloud Computing and Software Services: Theory and Techniques (to appear): CRC Press, 00 2009.
“Translational Process: Mathematical Software Perspective,”
Journal of Computational Science, September 2020.
DOI: 10.1016/j.jocs.2020.101216 (752.59 KB)
“Translational process: Mathematical software perspective,”
Journal of Computational Science, vol. 52, pp. 101216, 2021.
DOI: 10.1016/j.jocs.2020.101216
“Trace-based Performance Analysis for the Petascale Simulation Code FLASH,”
International Journal of High Performance Computing Applications (to appear), 00 2010.
(887.54 KB)
“Towards Efficient MapReduce Using MPI,”
Lecture Notes in Computer Science, Recent Advances in Parallel Virtual Machine and Message Passing Interface - 16th European PVM/MPI Users' Group Meeting, vol. 5759, Espoo, Finland, Springer Berlin / Heidelberg, pp. 240-249, 00 2009.
“Towards Efficient MapReduce Using MPI,”
Lecture Notes in Computer Science, Recent Advances in Parallel Virtual Machine and Message Passing Interface - 16th European PVM/MPI Users' Group Meeting, vol. 5759, Espoo, Finland, Springer Berlin / Heidelberg, pp. 240-249, 00 2009.
“Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems,”
Parallel Computing, vol. 36, no. 5-6, pp. 232-240, 00 2010.
(606.41 KB)
“Towards an Accurate Model for Collective Communications,”
International Journal of High Performance Applications, Special Issue: Automatic Performance Tuning, vol. 18, no. 1, pp. 159-167, January 2004.
(250.73 KB)
“Toward High Performance Divide and Conquer Eigensolver for Dense Symmetric Matrices,”
SIAM Journal on Scientific Computing (Accepted), July 2012.
“Toward High Performance Divide and Conquer Eigensolver for Dense Symmetric Matrices.,”
Submitted to SIAM Journal on Scientific Computing (SISC), 00 2011.
“The TOP500 List and Progress in High-Performance Computing,”
IEEE Computer, vol. 48, issue 11, pp. 42-49, November 2015.
DOI: doi:10.1109/MC.2015.338
“Tiling on Systems with Communication/Computation Overlap,”
Concurrency: Practice and Experience, vol. 11, no. 3, pp. 139-153, January 1999.
(286.14 KB)
“Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries,”
Journal of Parallel and Distributed Computing, vol. 61, no. 12, pp. 1803-1826, December 2001.
(386.37 KB)
“Task Based Cholesky Decomposition on Xeon Phi Architectures using OpenMP,”
International Journal of Computational Science and Engineering (IJCSE), vol. 17, no. 3, October 2018.
DOI: http://dx.doi.org/10.1504/IJCSE.2018.095851
“Symmetric Indefinite Linear Solver using OpenMP Task on Multicore Architectures,”
IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 8, pp. 1879–1892, August 2018.
DOI: 10.1109/TPDS.2018.2808964 (2.88 MB)
“A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination,”
Concurrency and Computation: Practice and Experience, vol. 27, issue 5, pp. 1292-1309, April 2015.
DOI: 10.1002/cpe.3306 (783.45 KB)
“A survey of numerical linear algebra methods utilizing mixed-precision arithmetic,”
The International Journal of High Performance Computing Applications, vol. 35, no. 4, pp. 344–369, 2021.
DOI: 10.1177/10943420211003313
“Sunway TaihuLight Supercomputer Makes Its Appearance,”
National Science Review, vol. 3, issue 3, pp. 256-266, September 2016.
DOI: 10.1093/nsr/nww044 (292.11 KB)
“Structure-aware Linear Solver for Realtime Convex Optimization for Embedded Systems,”
IEEE Embedded Systems Letters, vol. 9, issue 3, pp. 61–64, May 2017.
DOI: 10.1109/LES.2017.2700401 (339.11 KB)
“Stochastic Performance Prediction for Iterative Algorithms in Distributed Environments,”
Journal of Parallel and Distributed Computing, vol. 98, no. 1, pp. 68-91, October 2002.
(266.82 KB)
“Stochastic Performance Prediction for Iterative Algorithms in Distributed Environments,”
Journal of Parallel and Distributed Computing, vol. 98, no. 1, pp. 68-91, January 1999.
(257.5 KB)
“Static Tiling for Heterogeneous Computing Platforms,”
Parallel Computing, vol. 25, no. 5, pp. 547-568, January 1999.
(301.17 KB)
“State-of-the-Art Eigensolvers for Electronic Structure Calculations of Large Scale Nano-Systems,”
Journal of Computational Physics, vol. 227, no. 15, pp. 7113-7124, January 2008.
“Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU,”
ACM Transactions on Mathematical Software (TOMS), vol. 43, issue 2, October 2016.
“