Publications
Export 1287 results:
Filters: 10.1007 is 978-3-030-66057-4_11 [Clear All Filters]
Stochastic Performance Prediction for Iterative Algorithms in Distributed Environments,”
Journal of Parallel and Distributed Computing, vol. 98, no. 1, pp. 68-91, October 2002.
(266.82 KB)
“Structure-aware Linear Solver for Realtime Convex Optimization for Embedded Systems,”
IEEE Embedded Systems Letters, vol. 9, issue 3, pp. 61–64, May 2017.
DOI: 10.1109/LES.2017.2700401 (339.11 KB)
“Sunway TaihuLight Supercomputer Makes Its Appearance,”
National Science Review, vol. 3, issue 3, pp. 256-266, September 2016.
DOI: 10.1093/nsr/nww044 (292.11 KB)
“Surrogate ML/AI Model Benchmarking for FAIR Principles' Conformance,”
2022 IEEE High Performance Extreme Computing Conference (HPEC): IEEE, September 2022.
DOI: 10.1109/HPEC55821.2022.9926401
“A Survey of MPI Usage in the US Exascale Computing Project,”
Concurrency Computation: Practice and Experience, September 2018.
DOI: 10.1002/cpe.4851 (359.54 KB)
“A survey of numerical linear algebra methods utilizing mixed-precision arithmetic,”
The International Journal of High Performance Computing Applications, vol. 35, no. 4, pp. 344–369, 2021.
DOI: 10.1177/10943420211003313
“A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic,”
SLATE Working Notes, no. 15, ICL-UT-20-08: University of Tennessee, July 2020.
(3.98 MB)
“A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination,”
Concurrency and Computation: Practice and Experience, vol. 27, issue 5, pp. 1292-1309, April 2015.
DOI: 10.1002/cpe.3306 (783.45 KB)
“A survey on checkpointing strategies: Should we always checkpoint à la Young/Daly?,”
Future Generation Computer Systems, July 2024.
DOI: 10.1016/j.future.2024.07.022
“Surviving Errors with OpenSHMEM,”
OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, Baltimore, MD, USA, Springer International Publishing, pp. 66–81, 2016.
“Symmetric Indefinite Linear Solver using OpenMP Task on Multicore Architectures,”
IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 8, pp. 1879–1892, August 2018.
DOI: 10.1109/TPDS.2018.2808964 (2.88 MB)
“Synchronizing MPI Processes in Space and Time,”
EUROMPI '23: 30th European MPI Users' Group Meeting, Bristol, United Kingdom, ACM, September 2023.
DOI: 10.1145/3615318.3615325
“System Software for Many-Core and Multi-Core Architectures,”
Advanced Software Technologies for Post-Peta Scale Computing: The Japanese Post-Peta CREST Research Project, Singapore, Springer Singapore, pp. 59–75, 2019.
DOI: 10.1007/978-981-13-1924-2_4
“A Systematic Multi-step Methodology for Performance Analysis of Communication Traces of Distributed Applications based on Hierarchical Clustering,”
Proc. of the 5th International Workshop on Performance Modeling, Evaluation, and Organization of Parallel and Distributed Systems (PMEO-PDS 2006), no. ICL-UT-05-06, Rhodes Island, Greece, IEEE Computer Society, April 2006.
(1.02 MB)
“Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes,”
23rd International Heterogeneity in Computing Workshop, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
(807.33 KB)
“Taking the MPI standard and the open MPI library to exascale,”
The International Journal of High Performance Computing Applications, July 2024.
DOI: 10.1177/10943420241265936
“Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures,”
Innovative Computing Laboratory Technical Report (also LAPACK Working Note 222 and CS Tech Report UT-CS-09-645), no. ICL-UT-09-03, September 2009.
(464.23 KB)
“Task Based Cholesky Decomposition on Xeon Phi Architectures using OpenMP,”
International Journal of Computational Science and Engineering (IJCSE), vol. 17, no. 3, October 2018.
DOI: http://dx.doi.org/10.1504/IJCSE.2018.095851
“Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance,”
International Conference for High Performance Computing Networking, Storage, and Analysis (SC20): ACM, November 2020.
(644.92 KB)
“Task placement of parallel multi-dimensional FFTs on a mesh communication network,”
University of Tennessee Computer Science Technical Report, no. UT-CS-08-613, January 2008.
(2.33 MB)
“Task-Based Polar Decomposition Using SLATE on Massively Parallel Systems with Hardware Accelerators,”
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
DOI: 10.1145/3624062.3624248
“Task-Based Programming for Seismic Imaging: Preliminary Results,”
2014 IEEE International Conference on High Performance Computing and Communications (HPCC), Paris, France, IEEE, August 2014.
(625.86 KB)
“Task-graph scheduling extensions for efficient synchronization and communication,”
Proceedings of the ACM International Conference on Supercomputing, pp. 88–101, 2021.
DOI: 10.1145/3447818.3461616
“Technical Comparison between several representative checkpoint/rollback solutions for MPI programs,”
ICL Technical Report, no. ICL-UT-06-09, January 2006.
(84.67 KB)
“Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries,”
Journal of Parallel and Distributed Computing, vol. 61, no. 12, pp. 1803-1826, December 2001.
(386.37 KB)
“The Template Task Graph (TTG) - An Emerging Practical Dataflow Programming Paradigm for Scientific Simulation at Extreme Scale,”
2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2): IEEE, November 2020.
DOI: 10.1109/ESPM251964.2020.00011 (139.6 KB)
“Tensor Contraction on Distributed Hybrid Architectures using a Task-Based Runtime System,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-13: University of Tennessee, December 2018.
(326.11 KB)
“Tensor Contractions using Optimized Batch GEMM Routines
, San Jose, CA, GPU Technology Conference (GTC), Poster, March 2018.
(1.64 MB)
Then and Now: Improving Software Portability, Productivity, and 100× Performance,”
Computing in Science & Engineering, pp. 1 - 10, April 2024.
DOI: 10.1109/MCSE.2024.3387302
“Three-dimensional parallel frequency-domain visco-acoustic wave modelling based on a hybrid direct/iterative solver.,”
To appear in Geophysical Prospecting journal., 00 2011.
(1.04 MB)
“Three-precision algebraic multigrid on GPUs,”
Future Generation Computer Systems, July 2023.
DOI: 10.1016/j.future.2023.07.024
“Threshold Pivoting for Dense LU Factorization,”
ScalAH22: 13th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems , Dallas, Texas, IEEE, November 2022.
DOI: 10.1109/ScalAH56622.2022.00010 (721.77 KB)
“Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,”
accepted in 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, December 2009.
“Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,”
24th IEEE International Parallel and Distributed Processing Symposium (submitted), 00 2010.
(313.98 KB)
“Tiling on Systems with Communication/Computation Overlap,”
Concurrency: Practice and Experience, vol. 11, no. 3, pp. 139-153, January 1999.
(286.14 KB)
“The TOP500 List and Progress in High-Performance Computing,”
IEEE Computer, vol. 48, issue 11, pp. 42-49, November 2015.
DOI: doi:10.1109/MC.2015.338
“Top500 Supercomputer Sites (13th edition),”
University of Tennessee Computer Science Department Technical Report, no. UT-CS-99-425, June 1999.
(278.51 KB)
“Top500 Supercomputer Sites (14th edition),”
University of Tennessee Computer Science Department Technical Report, no. UT-CS-99-434, November 1999.
(281.81 KB)
“Top500 Supercomputer Sites (15th edition),”
University of Tennessee Computer Science Department Technical Report, no. UT-CS-00-442, June 2000.
(278.88 KB)
“Toward a Framework for Preparing and Executing Adaptive Grid Programs,”
International Parallel and Distributed Processing Symposium: IPDPS 2002 Workshops, Fort Lauderdale, FL, pp. 0171, April 2002.
(64.5 KB)
“Toward a Modular Precision Ecosystem for High-Performance Computing,”
The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1069-1078, November 2019.
DOI: 10.1177/1094342019846547 (1.93 MB)
“Toward a New Metric for Ranking High Performance Computing Systems,”
SAND2013 - 4744, June 2013.
(225.32 KB)
“Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication,”
Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013.
DOI: 10.1145/2464996.2465438 (1.27 MB)
“Toward High Performance Divide and Conquer Eigensolver for Dense Symmetric Matrices,”
SIAM Journal on Scientific Computing (Accepted), July 2012.
“Toward High Performance Divide and Conquer Eigensolver for Dense Symmetric Matrices.,”
Submitted to SIAM Journal on Scientific Computing (SISC), 00 2011.
“Towards a Complexity Analysis of Sparse Hybrid Linear Solvers,”
PARA 2010, Reykjavik, Iceland, June 2010.
“Towards a High-Performance Tensor Algebra Package for Accelerators
, Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC15), September 2015.
(1.76 MB)
Towards a New Peer Review Concept for Scientific Computing ensuring Technical Quality, Software Sustainability, and Result Reproducibility,”
Proceedings in Applied Mathematics and Mechanics, vol. 19, issue 1, November 2019.
DOI: 10.1002/pamm.201900490
“Towards a Parallel Tile LDL Factorization for Multicore Architectures,”
ICL Technical Report, no. ICL-UT-11-03, Seattle, WA, April 2011.
(425.45 KB)
“Towards Achieving Performance Portability Using Directives for Accelerators,”
The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, Utah, Innovative Computing Laboratory, University of Tennessee, November 2016.
(567.02 KB)
“