Publications
Sparse approximations of the Schur complement for parallel algebraic hybrid solvers in 3D,”
Numerical Mathematics: Theory, Methods and Applications, vol. 3, no. 3, Beijing, Golbal Science Press, pp. 64-82, 00 2010.
“Sparse matrix-vector and matrix-multivector products for the truncated SVD on graphics processors,”
Concurrency and Computation: Practice and Experience, August 2023.
“Specification and detection of performance problems with ASL,”
Concurrency and Computation: Practice and Experience, vol. 19, no. 11: John Wiley and Sons Ltd., pp. 1451-1464, January 2007.
“SRS - A Framework for Developing Malleable and Migratable Parallel Software,”
Parallel Processing Letters, vol. 13, no. 2, pp. 291-312, June 2003.
(211.6 KB)
“Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU,”
ACM Transactions on Mathematical Software (TOMS), vol. 43, issue 2, October 2016.
“State-of-the-Art Eigensolvers for Electronic Structure Calculations of Large Scale Nano-Systems,”
Journal of Computational Physics, vol. 227, no. 15, pp. 7113-7124, January 2008.
“Static Tiling for Heterogeneous Computing Platforms,”
Parallel Computing, vol. 25, no. 5, pp. 547-568, January 1999.
(301.17 KB)
“Stochastic Performance Prediction for Iterative Algorithms in Distributed Environments,”
Journal of Parallel and Distributed Computing, vol. 98, no. 1, pp. 68-91, October 2002.
(266.82 KB)
“Stochastic Performance Prediction for Iterative Algorithms in Distributed Environments,”
Journal of Parallel and Distributed Computing, vol. 98, no. 1, pp. 68-91, January 1999.
(257.5 KB)
“Structure-aware Linear Solver for Realtime Convex Optimization for Embedded Systems,”
IEEE Embedded Systems Letters, vol. 9, issue 3, pp. 61–64, May 2017.
(339.11 KB)
“Sunway TaihuLight Supercomputer Makes Its Appearance,”
National Science Review, vol. 3, issue 3, pp. 256-266, September 2016.
(292.11 KB)
“A Survey of MPI Usage in the US Exascale Computing Project,”
Concurrency Computation: Practice and Experience, September 2018.
(359.54 KB)
“A survey of numerical linear algebra methods utilizing mixed-precision arithmetic,”
The International Journal of High Performance Computing Applications, vol. 35, no. 4, pp. 344–369, 2021.
“A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination,”
Concurrency and Computation: Practice and Experience, vol. 27, issue 5, pp. 1292-1309, April 2015.
(783.45 KB)
“A survey on checkpointing strategies: Should we always checkpoint à la Young/Daly?,”
Future Generation Computer Systems, July 2024.
“Symmetric Indefinite Linear Solver using OpenMP Task on Multicore Architectures,”
IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 8, pp. 1879–1892, August 2018.
(2.88 MB)
“Task Based Cholesky Decomposition on Xeon Phi Architectures using OpenMP,”
International Journal of Computational Science and Engineering (IJCSE), vol. 17, no. 3, October 2018.
“Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries,”
Journal of Parallel and Distributed Computing, vol. 61, no. 12, pp. 1803-1826, December 2001.
(386.37 KB)
“Then and Now: Improving Software Portability, Productivity, and 100× Performance,”
Computing in Science & Engineering, pp. 1 - 10, April 2024.
“Three-dimensional parallel frequency-domain visco-acoustic wave modelling based on a hybrid direct/iterative solver.,”
To appear in Geophysical Prospecting journal., 00 2011.
(1.04 MB)
“Three-precision algebraic multigrid on GPUs,”
Future Generation Computer Systems, July 2023.
“Tiling on Systems with Communication/Computation Overlap,”
Concurrency: Practice and Experience, vol. 11, no. 3, pp. 139-153, January 1999.
(286.14 KB)
“The TOP500 List and Progress in High-Performance Computing,”
IEEE Computer, vol. 48, issue 11, pp. 42-49, November 2015.
“Toward a Modular Precision Ecosystem for High-Performance Computing,”
The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1069-1078, November 2019.
(1.93 MB)
“Toward High Performance Divide and Conquer Eigensolver for Dense Symmetric Matrices.,”
Submitted to SIAM Journal on Scientific Computing (SISC), 00 2011.
“Toward High Performance Divide and Conquer Eigensolver for Dense Symmetric Matrices,”
SIAM Journal on Scientific Computing (Accepted), July 2012.
“Towards a Complexity Analysis of Sparse Hybrid Linear Solvers,”
PARA 2010, Reykjavik, Iceland, June 2010.
“Towards a New Peer Review Concept for Scientific Computing ensuring Technical Quality, Software Sustainability, and Result Reproducibility,”
Proceedings in Applied Mathematics and Mechanics, vol. 19, issue 1, November 2019.
“Towards an Accurate Model for Collective Communications,”
International Journal of High Performance Applications, Special Issue: Automatic Performance Tuning, vol. 18, no. 1, pp. 159-167, January 2004.
(250.73 KB)
“Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems,”
Parallel Computing, vol. 36, no. 5-6, pp. 232-240, 00 2010.
(606.41 KB)
“Towards Efficient MapReduce Using MPI,”
Lecture Notes in Computer Science, Recent Advances in Parallel Virtual Machine and Message Passing Interface - 16th European PVM/MPI Users' Group Meeting, vol. 5759, Espoo, Finland, Springer Berlin / Heidelberg, pp. 240-249, 00 2009.
“Towards Optimal Multi-Level Checkpointing,”
IEEE Transactions on Computers, vol. 66, issue 7, pp. 1212–1226, July 2017.
(1.39 MB)
“Trace-based Performance Analysis for the Petascale Simulation Code FLASH,”
International Journal of High Performance Computing Applications (to appear), 00 2010.
(887.54 KB)
“Translational Process: Mathematical Software Perspective,”
Journal of Computational Science, September 2020.
(752.59 KB)
“Translational process: Mathematical software perspective,”
Journal of Computational Science, vol. 52, pp. 101216, 2021.
“Transparent Cross-Platform Access to Software Services using GridSolve and GridRPC,”
in Cloud Computing and Software Services: Theory and Techniques (to appear): CRC Press, 00 2009.
“Trends in High Performance Computing,”
The Computer Journal, vol. 47, no. 4: The British Computer Society, pp. 399-403, 00 2004.
(455.96 KB)
“A Tribute to Gene Golub,”
Computing in Science and Engineering: IEEE, pp. 5, January 2008.
“Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems,”
Concurrency and Computation: Practice and Experience, October 2013.
(1.71 MB)
“Truss Structural Optimization Using NetSolve System,”
Meeting of the Japan Society of Mechanical Engineers, Kyoto University, Kyoto, Japan, October 2002.
(450.65 KB)
“Tuning Principal Component Analysis for GRASS GIS on Multi-core and GPU Architectures,”
FOSS4G 2010, Barcelona, Spain, September 2010.
(1.57 MB)
“Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,”
Concurrency and Computation: Practice and Experience, November 2013.
(894.61 KB)
“Unveiling the Performance-energy Trade-off in Iterative Linear System Solvers for Multithreaded Processors,”
Concurrency and Computation: Practice and Experience, vol. 27, issue 4, pp. 885-904, September 2014.
(1.83 MB)
“An Updated Set of Basic Linear Algebra Subprograms (BLAS),”
ACM Transactions on Mathematical Software, vol. 28, no. 2, pp. 135-151, December 2002.
(228.33 KB)
“Updating Incomplete Factorization Preconditioners for Model Order Reduction,”
Numerical Algorithms, vol. 73, issue 3, no. 3, pp. 611–630, February 2016.
(565.34 KB)
“The use of bulk states to accelerate the band edge state calculation of a semiconductor quantum dot,”
Journal of Computational Physics (submitted), January 2006.
(337.08 KB)
“The Use of Bulk States to Accelerate the Band Edge State Calculation of a Semiconductor Quantum Dot,”
Journal of Computational Physics, vol. 223, pp. 774-782, 00 2007.
(452.6 KB)
“User-Defined Events for Hardware Performance Monitoring,”
Procedia Computer Science, vol. 4: Elsevier, pp. 2096-2104, May 2011.
(361.76 KB)
“Using Ginkgo's memory accessor for improving the accuracy of memory‐bound low precision BLAS,”
Software: Practice and Experience, vol. 532, issue 1, pp. 81 - 98, January Jan.
“Using Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning,”
Journal of Parallel and Distributed Computing, vol. 119, pp. 219–230, November 2018.
(273.53 KB)
“