Publications
Experiments with Scheduling Using Simulated Annealing in a Grid Environment,”
Grid Computing - GRID 2002, Third International Workshop, vol. 2536, Baltimore, MD, Springer, pp. 232-242, November 2002.
(66.91 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Cholesky Across Accelerators,”
17th IEEE International Conference on High Performance Computing and Communications (HPCC 2015), Elizabeth, NJ, IEEE, August 2015.
“GridSolve: The Evolution of Network Enabled Solver,”
Grid-Based Problem Solving Environments: IFIP TC2/WG 2.5 Working Conference on Grid-Based Problem Solving Environments (Prescott, AZ, July 2006): Springer, pp. 215-226, 00 2007.
(377.48 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Recent Developments in GridSolve,”
International Journal of High Performance Computing Applications (Special Issue: Scheduling for Large-Scale Heterogeneous Platforms), vol. 20, no. 1: Sage Science Press, 00 2006.
(496.69 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Porting the PLASMA Numerical Library to the OpenMP Standard,”
International Journal of Parallel Programming, June 2016.
(1.66 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Biological Sequence Alignment on the Computational Grid Using the GrADS Framework,”
Future Generation Computing Systems, vol. 21, no. 6: Elsevier, pp. 980-986, June 2005.
(147.29 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
QUARK Users' Guide: QUeueing And Runtime for Kernels,”
University of Tennessee Innovative Computing Laboratory Technical Report, no. ICL-UT-11-02, 00 2011.
(247.12 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automatic Blocking of QR and LU Factorizations for Locality,”
2nd ACM SIGPLAN Workshop on Memory System Performance (MSP 2004), Washington, DC, ACM, June 2004.
(212.77 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automated Empirical Tuning of a Multiresolution Analysis Kernel,”
ICL Technical Report, no. ICL-UT-07-01, pp. 10, January 2007.
(120.7 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Empirical Tuning of a Multiresolution Analysis Kernel using a Specialized Code Generator,”
ICL Technical Report, no. ICL-UT-07-02, January 2007.
(123.34 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Autotuned Parallel I/O for Highly Scalable Biosequence Analysis,”
TeraGrid'11, Salt Lake City, Utah, July 2011.
(275.34 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
An Effective Empirical Search Method for Automatic Software Tuning,”
ICL Technical Report, no. ICL-UT-05-02, January 2005.
(74.66 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The Design of an Auto-tuning I/O Framework on Cray XT5 System,”
Cray Users Group Conference (CUG'11) (Best Paper Finalist), Fairbanks, Alaska, May 2011.
(459.57 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The Impact of Paravirtualized Memory Hierarchy on Linear Algebra Computational Kernels and Software,”
ACM/IEEE International Symposium on High Performance Distributed Computing, Boston, MA., June 2008.
(403.89 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Paravirtualization Effect on Single- and Multi-threaded Memory-Intensive Linear Algebra Software,”
Cluster Computing Journal: Special Issue on High Performance Distributed Computing, vol. 12, no. 2: Springer Netherlands, pp. 101-122, 00 2009.
(451.07 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Solving Linear Diophantine Systems on Parallel Architectures,”
IEEE Transactions on Parallel and Distributed Systems, vol. 30, issue 5, pp. 1158-1169, May 2019.
(802.97 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Docker Container based PaaS Cloud Computing Comprehensive Benchmarks using LAPACK,”
Computer Modeling and Intelligent Systems CMIS-2020, Zaporizhzhoa, March 2020.
(451.33 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Efficient Communications in Training Large Scale Neural Networks,”
ACM MultiMedia Workshop 2017, Mountain View, CA, ACM, October 2017.
(1.41 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Using Advanced Vector Extensions AVX-512 for MPI Reduction,”
EuroMPI/USA '20: 27th European MPI Users' Group Meeting, Austin, TX, September 2020.
(634.45 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Using long vector extensions for MPI reductions,”
Parallel Computing, vol. 109, pp. 102871, March 2022.
“Using Arm Scalable Vector Extension to Optimize Open MPI,”
20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020), Melbourne, Australia, IEEE/ACM, May 2020.
(359.95 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Runtime Level Failure Detection and Propagation in HPC Systems,”
European MPI Users' Group Meeting (EuroMPI '19), Zürich, Switzerland, ACM, September 2019.
(1.11 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Using Advanced Vector Extensions AVX-512 for MPI Reduction (Poster)
, Austin, TX, EuroMPI/USA '20: 27th European MPI Users' Group Meeting, September 2020.
(708.68 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
Predicting the electronic properties of 3D, million-atom semiconductor nanostructure architectures,”
J. Phys.: Conf. Ser. 46, vol. :101088/1742-6596/46/1/040, pp. 292-298, January 2006.
(644.1 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)