Publications
Export 1287 results:
Filters: 10.1002 is cpe.7400 [Clear All Filters]
Towards an Accurate Model for Collective Communications,”
International Journal of High Performance Applications, Special Issue: Automatic Performance Tuning, vol. 18, no. 1, pp. 159-167, January 2004.
(250.73 KB)
“Towards an Accurate Model for Collective Communications,”
ICL Technical Report, no. ICL-UT-05-03, January 2005.
(250.73 KB)
“Towards An Efficient, Scalable Replication Mechanism for the I2-DSI Project,”
University of North Carolina School of Library and Information Science Technical Report, no. TR-1999-01, January 1999.
“Towards Batched Linear Solvers on Accelerated Hardware Platforms,”
8th Workshop on General Purpose Processing Using GPUs (GPGPU 8) co-located with PPOPP 2015, San Francisco, CA, ACM, February 2015.
(403.74 KB)
“Towards bulk based preconditioning for quantum dot computations,”
IEEE/ACM Proceedings of HPCNano SC06 (to appear), January 2006.
(172.46 KB)
“Towards Continuous Benchmarking,”
Platform for Advanced Scientific Computing Conference (PASC 2019), Zurich, Switzerland, ACM Press, June 2019.
DOI: 10.1145/3324989.3325719 (1.51 MB)
“Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems,”
Parallel Computing, vol. 36, no. 5-6, pp. 232-240, 00 2010.
(606.41 KB)
“Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems,”
University of Tennessee Computer Science Technical Report, UT-CS-08-632 (also LAPACK Working Note 210), January 2008.
(606.41 KB)
“Towards Efficient MapReduce Using MPI,”
Lecture Notes in Computer Science, Recent Advances in Parallel Virtual Machine and Message Passing Interface - 16th European PVM/MPI Users' Group Meeting, vol. 5759, Espoo, Finland, Springer Berlin / Heidelberg, pp. 240-249, 00 2009.
“Towards Half-Precision Computation for Complex Matrices: A Case Study for Mixed Precision Solvers on GPUs,”
ScalA19: 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Denver, CO, IEEE, November 2019.
(523.87 KB) (3.42 MB)
“Towards Numerical Benchmark for Half-Precision Floating Point Arithmetic,”
2017 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, IEEE, September 2017.
DOI: 10.1109/HPEC.2017.8091031 (1.67 MB)
“Towards Optimal Multi-Level Checkpointing,”
IEEE Transactions on Computers, vol. 66, issue 7, pp. 1212–1226, July 2017.
DOI: 10.1109/TC.2016.2643660 (1.39 MB)
“Towards Portable Online Prediction of Network Utilization Using MPI-Level Monitoring,”
2019 European Conference on Parallel Processing (Euro-Par 2019), Göttingen, Germany, Springer, August 2019.
DOI: 10.1007/978-3-030-29400-7_4 (1.07 MB)
“Trace-Based Parallel Performance Overhead Compensation,”
In Proc. of the International Conference on High Performance Computing and Communications (HPCC), Sorrento (Naples), Italy, September 2005.
(306.88 KB)
“Trace-based Performance Analysis for the Petascale Simulation Code FLASH,”
International Journal of High Performance Computing Applications (to appear), 00 2010.
(887.54 KB)
“Trace-based Performance Analysis for the Petascale Simulation Code FLASH,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-09-01, April 2009.
(887.54 KB)
“Transient Error Resilient Hessenberg Reduction on GPU-based Hybrid Architectures,”
UT-CS-13-712: University of Tennessee Computer Science Technical Report, June 2013.
(206.42 KB)
“Translational process: Mathematical software perspective,”
Journal of Computational Science, vol. 52, pp. 101216, 2021.
DOI: 10.1016/j.jocs.2020.101216
“Translational Process: Mathematical Software Perspective,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-11, August 2020.
(752.59 KB)
“Translational Process: Mathematical Software Perspective,”
Journal of Computational Science, September 2020.
DOI: 10.1016/j.jocs.2020.101216 (752.59 KB)
“Transparent Cross-Platform Access to Software Services using GridSolve and GridRPC,”
in Cloud Computing and Software Services: Theory and Techniques (to appear): CRC Press, 00 2009.
“Trends in High Performance Computing,”
The Computer Journal, vol. 47, no. 4: The British Computer Society, pp. 399-403, 00 2004.
(455.96 KB)
“A Tribute to Gene Golub,”
Computing in Science and Engineering: IEEE, pp. 5, January 2008.
“Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems,”
Concurrency and Computation: Practice and Experience, October 2013.
(1.71 MB)
“Tridiagonalization of a Symmetric Dense Matrix on a GPU Cluster,”
The Third International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), May 2013.
“Truss Structural Optimization Using NetSolve System,”
Meeting of the Japan Society of Mechanical Engineers, Kyoto University, Kyoto, Japan, October 2002.
(450.65 KB)
“Tuning Principal Component Analysis for GRASS GIS on Multi-core and GPU Architectures,”
FOSS4G 2010, Barcelona, Spain, September 2010.
(1.57 MB)
“Tuning Stationary Iterative Solvers for Fault Resilience,”
6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA15), Austin, TX, ACM, November 2015.
(1.28 MB)
“Twenty Years of Computational Science,”
International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, June 2020.
(149.66 KB)
“Twenty-Plus Years of Netlib and NA-Net,”
University of Tennessee Computer Science Department Technical Report, UT-CS-04-526, 00 2006.
(62.79 KB)
“Two-stage Tridiagonal Reduction for Dense Symmetric Matrices using Tile Algorithms on Multicore Architectures,”
IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.
“UCX: An Open Source Framework for HPC Network APIs and Beyond,”
2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, Santa Clara, CA, USA, IEEE, pp. 40-43, 2015.
DOI: 10.1109/HOTI.2015.13
“Understanding Native Event Semantics
, Knoxville, TN, 9th JLESC Workshop, April 2019.
(2.33 MB)
Understanding Scalability and Fine-Grain Parallelism of Synchronous Data Parallel Training,”
2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), Denver, CO, IEEE, November 2019.
DOI: 10.1109/MLHPC49564.2019.00006 (696.89 KB)
“Unified Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment,”
IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
(1.51 MB)
“A Unified HPC Environment for Hybrid Manycore/GPU Distributed Systems,”
IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.
“Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,”
Concurrency and Computation: Practice and Experience, November 2013.
DOI: 10.1002/cpe.3173 (894.61 KB)
“Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,”
University of Tennessee Computer Science Technical Report (also LAWN 269), no. UT-CS-12-697, June 2012.
(2.76 MB)
“Unveiling the Performance-energy Trade-off in Iterative Linear System Solvers for Multithreaded Processors,”
Concurrency and Computation: Practice and Experience, vol. 27, issue 4, pp. 885-904, September 2014.
DOI: 10.1002/cpe.3341 (1.83 MB)
“An Updated Set of Basic Linear Algebra Subprograms (BLAS),”
ACM Transactions on Mathematical Software, vol. 28, no. 2, pp. 135-151, December 2002.
DOI: 10.1145/567806.567807 (228.33 KB)
“Updating Incomplete Factorization Preconditioners for Model Order Reduction,”
Numerical Algorithms, vol. 73, issue 3, no. 3, pp. 611–630, February 2016.
DOI: 10.1007/s11075-016-0110-2 (565.34 KB)
“Usage of the Scalasca Toolset for Scalable Performance Analysis of Large-scale Parallel Applications,”
Proceedings of the 2nd International Workshop on Tools for High Performance Computing, Stuttgart, Germany, Springer, pp. 157-167, January 2008.
(229.2 KB)
“The Use of Bulk States to Accelerate the Band Edge State Calculation of a Semiconductor Quantum Dot,”
Journal of Computational Physics, vol. 223, pp. 774-782, 00 2007.
(452.6 KB)
“The use of bulk states to accelerate the band edge state calculation of a semiconductor quantum dot,”
Journal of Computational Physics (submitted), January 2006.
(337.08 KB)
“User Level Failure Mitigation in MPI,”
Euro-Par 2012: Parallel Processing Workshops, vol. 7640, Rhodes Island, Greece, Springer Berlin Heidelberg, pp. 499-504, August 2012.
(136.15 KB)
“User-Defined Events for Hardware Performance Monitoring,”
Procedia Computer Science, vol. 4: Elsevier, pp. 2096-2104, May 2011.
DOI: 10.1016/j.procs.2011.04.229 (361.76 KB)
“Users' Guide to NetSolve v1.4.1,”
ICL Technical Report, no. ICL-UT-02-05, June 2002.
(328.01 KB)
“Using Additive Modifications in LU Factorization Instead of Pivoting,”
37th ACM International Conference on Supercomputing (ICS'23), Orlando, FL, ACM, June 2023.
DOI: 10.1145/3577193.3593731 (624.18 KB)
“Using Advanced Vector Extensions AVX-512 for MPI Reduction,”
EuroMPI/USA '20: 27th European MPI Users' Group Meeting, Austin, TX, September 2020.
DOI: 10.1145/3416315.3416316 (634.45 KB)
“Using Advanced Vector Extensions AVX-512 for MPI Reduction (Poster)
, Austin, TX, EuroMPI/USA '20: 27th European MPI Users' Group Meeting, September 2020.
(708.68 KB)