Publications
Numerical Metadata API Reference,”
Innovative Computing Laboratory Technical Report, February 2007.
(454.79 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The 'Weighted Modification' Incomplete Factorisation Method,”
University of Tennessee Computer Science Department Technical Report, no. UT-CS-99-436, December 1999.
(198.71 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Polynomial Acceleration of Optimised Multi-grid Smoothers; Basic Theory,”
ICL Technical Report, vol. 156, no. ICL-UT-02-03, January 2002.
(100.66 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
On the Existence Problem of Incomplete Factorisation Methods,”
University of Tennessee Computer Science Department Technical Report, no. UT-CS-99-435, December 1999.
(222.2 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Least Squares Solvers for Distributed-Memory Machines with GPU Accelerators,”
ACM International Conference on Supercomputing (ICS '19), Phoenix, Arizona, ACM, pp. 117–126, June 2019.
(1.63 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Applying Aspect-Oriented Programming Concepts to a Component-based Programming Model,”
IPDPS 2003, Workshop on NSF-Next Generation Software, Nice, France, March 2003.
(66.99 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Improvements in the Efficient Composition of Applications,”
IPDPS 2004, NGS Workshop (to appear), Sante Fe, 00 2004.
(42.85 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Using Software-Based Performance Counters to Expose Low-Level Open MPI Performance Information,”
EuroMPI, Chicago, IL, ACM, September 2017.
(745.58 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Providing GPU Capability to LU and QR within the ScaLAPACK Framework,”
University of Tennessee Computer Science Technical Report (also LAWN 272), no. UT-CS-12-699, September 2012.
(7.48 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming,”
Parallel Computing, vol. 38, no. 8, pp. 391-407, August 2012.
(1.64 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Soft Error Resilient QR Factorization for Hybrid System with GPGPU,”
Journal of Computational Science, vol. 4, issue 6, pp. 457–464, November 2013.
(995.45 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
High Performance Dense Linear System Solver with Soft Error Resilience,”
IEEE Cluster 2011, Austin, TX, September 2011.
(1.27 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Algorithm-Based Fault Tolerance for Dense Matrix Factorization,”
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, ACM, pp. 225-234, February 2012.
(865.79 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Algorithm-based Fault Tolerance for Dense Matrix Factorizations,”
University of Tennessee Computer Science Technical Report, no. UT-CS-11-676, Knoxville, TN, August 2011.
(865.79 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed-Tool Performance Analysis on Hybrid Multicore Architectures,”
First International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI 2010), San Diego, CA, September 2010.
(1.24 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Optimal Checkpointing Strategies for Iterative Applications,”
IEEE Transactions on Parallel Distributed Systems, vol. 33, issue 3, pp. 507-522, March 2022.
(1.47 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
High Performance Dense Linear System Solver with Resilience to Multiple Soft Errors,”
ICCS 2012, Omaha, NE, June 2012.
(1.27 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Soft Error Resilient QR Factorization for Hybrid System with GPGPU,”
Journal of Computational Science, Seattle, WA, Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems at SC11, November 2011.
(965.88 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Soft Error Resilient QR Factorization for Hybrid System,”
University of Tennessee Computer Science Technical Report, no. UT-CS-11-675, Knoxville, TN, July 2011.
(1.39 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Soft Error Resilient QR Factorization for Hybrid System,”
UT-CS-11-675 (also LAPACK Working Note #252), no. ICL-CS-11-675, July 2011.
(1.39 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Tuning Principal Component Analysis for GRASS GIS on Multi-core and GPU Architectures,”
FOSS4G 2010, Barcelona, Spain, September 2010.
(1.57 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
OpenCL Evaluation for Numerical Linear Algebra Library Development,”
Symposium on Application Accelerators in High-Performance Computing (SAAHPC '10), Knoxville, TN, July 2010.
(2.69 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Robustness of the Young/Daly Formula for Stochastic Iterative Applications,”
49th International Conference on Parallel Processing (ICPP 2020), Edmonton, AB, Canada, ACM Press, August 2020.
(1.11 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Preconditioners for Batched Iterative Linear Solvers on GPUs,”
Smoky Mountains Computational Sciences and Engineering Conference, vol. 169075: Springer Nature Switzerland, pp. 38 - 53, January 2023.
“Task Based Cholesky Decomposition on Xeon Phi Architectures using OpenMP,”
International Journal of Computational Science and Engineering (IJCSE), vol. 17, no. 3, October 2018.
“JLAPACK - Compiling LAPACK Fortran to Java,”
Scientific Programming, vol. 7, no. 2, pp. 111-138, October 2002.
(307.46 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting,”
Concurrency and Computation: Practice and Experience, vol. 26, issue 7, pp. 1408-1431, May 2014.
(1.96 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Department Technical Report, UT-CS-04-526, vol. –89-95, January 2006.
(6.42 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
HPCS Library Study Effort,”
University of Tennessee Computer Science Technical Report, UT-CS-08-617, January 2008.
(73.22 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Translational Process: Mathematical Software Perspective,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-11, August 2020.
(752.59 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Numerical Libraries and Tools for Scalable Parallel Cluster Computing,”
International Journal of High Performance Applications and Supercomputing, vol. 15, no. 2, pp. 175-180, January 2001.
(37.38 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP,”
ACM Transactions on Mathematical Software, vol. 45, issue 2, June 2019.
(7.5 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
POMPEI: Programming with OpenMP4 for Exascale Investigations,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-09: University of Tennessee, December 2017.
(1.1 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Hierarchical QR Factorization Algorithms for Multi-Core Cluster Systems,”
University of Tennessee Computer Science Technical Report (also Lawn 257), no. UT-CS-11-684, October 2011.
(405.71 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Numerical Linear Algebra Algorithms and Software,”
Journal of Computational and Applied Mathematics, vol. 123, no. 1-2, pp. 489-514, October 1999.
(258.62 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
HPCG Benchmark: a New Metric for Ranking High Performance Computing Systems,”
University of Tennessee Computer Science Technical Report , no. ut-eecs-15-736: University of Tennessee, January 2015.
“Recent Advances in Parallel Virtual Machine and Message Passing Interface,”
Lecture Notes in Computer Science, vol. 2840: Springer-Verlag, Berlin, January 2003.
“HPC Challenge: Design, History, and Implementation Highlights,”
On the Road to Exascale Computing: Contemporary Architectures in High Performance Computing (to appear): Chapman & Hall/CRC Press, 00 2012.
(469.92 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
An Iterative Solver Benchmark,”
Scientific Programming (to appear), 00 2002.
(142.67 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Self Adapting Numerical Algorithm for Next Generation Applications,”
International Journal of High Performance Computing Applications, vol. 17, no. 2, pp. 125-132, January 2003.
(479.18 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Remembering Ken Kennedy,”
SciDAC Review, vol. 5, no. 2007, 00 2007.
(519.68 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors
, Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), November 2012.
(6.4 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Trends in High Performance Computing,”
The Computer Journal, vol. 47, no. 4: The British Computer Society, pp. 399-403, 00 2004.
(455.96 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Technical Report, UT-CS-89-85, 00 2010.
(6.42 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
How Elegant Code Evolves With Hardware: The Case Of Gaussian Elimination,”
in Beautiful Code Leading Programmers Explain How They Think: O'Reilly Media, Inc., June 2007.
(257 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Empirical Performance Tuning of Dense Linear Algebra Software,”
in Performance Tuning of Scientific Applications (to appear), 00 2010.
“MAGMA: A New Generation of Linear Algebra Library for GPU and Multicore Architectures
, Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), Presentation, November 2012.
(4.69 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Performance of Various Computers Using Standard Linear Equations Software,”
University of Tennessee Computer Science Technical Report, no. cs-89-85, February 2013.
(539.24 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Recursive approach in sparse matrix LU factorization,”
Proceedings of 1st SGI Users Conference, Cracow, Poland (ACC Cyfronet UMM, 2000), pp. 409-418, January 2000.
(176.14 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A New Metric for Ranking High-Performance Computing Systems,”
National Science Review, vol. 3, issue 1, pp. 30-35, January 2016.
(393.55 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)