Publications
Export 1283 results:
Filters: 10.1007 is 978-3-031-08751-6_5 [Clear All Filters]
Non-GPU-resident Dense Symmetric Indefinite Factorization,”
Concurrency and Computation: Practice and Experience, November 2016.
“Performance of Asynchronous Optimized Schwarz with One-sided Communication,”
Parallel Computing, vol. 86, pp. 66-81, August 2019.
(3.09 MB)
“Randomized Algorithms to Update Partial Singular Value Decomposition on a Hybrid CPU/GPU Cluster,”
The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.
“Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES,”
5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, New Orleans, LA, November 2014.
(465.52 KB)
“Mixed-precision orthogonalization scheme and adaptive step size for CA-GMRES on GPUs,”
VECPAR 2014 (Best Paper), Eugene, OR, June 2014.
(438.54 KB)
“Dynamic Task Execution on Shared and Distributed Memory Architectures
, 2012.
(3.29 MB)
Recent Developments in GridSolve,”
International Journal of High Performance Computing Applications (Special Issue: Scheduling for Large-Scale Heterogeneous Platforms), vol. 20, no. 1: Sage Science Press, 00 2006.
(496.69 KB)
“GridSolve: The Evolution of Network Enabled Solver,”
Grid-Based Problem Solving Environments: IFIP TC2/WG 2.5 Working Conference on Grid-Based Problem Solving Environments (Prescott, AZ, July 2006): Springer, pp. 215-226, 00 2007.
(377.48 KB)
“Biological Sequence Alignment on the Computational Grid Using the GrADS Framework,”
Future Generation Computing Systems, vol. 21, no. 6: Elsevier, pp. 980-986, June 2005.
(147.29 KB)
“Initial Integration and Evaluation of SLATE Parallel BLAS in LATTE,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-07: Innovative Computing Laboratory, University of Tennessee, June 2018.
(366.6 KB)
“QUARK Users' Guide: QUeueing And Runtime for Kernels,”
University of Tennessee Innovative Computing Laboratory Technical Report, no. ICL-UT-11-02, 00 2011.
(247.12 KB)
“SLATE Performance Report: Updates to Cholesky and LU Factorizations,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-14: University of Tennessee, October 2020.
(1.64 MB)
“An Empirical View of SLATE Algorithms on Scalable Hybrid System,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-08: University of Tennessee, Knoxville, September 2019.
(441.16 KB)
“Cholesky Across Accelerators,”
17th IEEE International Conference on High Performance Computing and Communications (HPCC 2015), Elizabeth, NJ, IEEE, August 2015.
“Experiments with Scheduling Using Simulated Annealing in a Grid Environment,”
Grid Computing - GRID 2002, Third International Workshop, vol. 2536, Baltimore, MD, Springer, pp. 232-242, November 2002.
(66.91 KB)
“Porting the PLASMA Numerical Library to the OpenMP Standard,”
International Journal of Parallel Programming, June 2016.
(1.66 MB)
“Automatic Blocking of QR and LU Factorizations for Locality,”
2nd ACM SIGPLAN Workshop on Memory System Performance (MSP 2004), Washington, DC, ACM, June 2004.
(212.77 KB)
“The Design of an Auto-tuning I/O Framework on Cray XT5 System,”
Cray Users Group Conference (CUG'11) (Best Paper Finalist), Fairbanks, Alaska, May 2011.
(459.57 KB)
“Automated Empirical Tuning of a Multiresolution Analysis Kernel,”
ICL Technical Report, no. ICL-UT-07-01, pp. 10, January 2007.
(120.7 KB)
“Autotuned Parallel I/O for Highly Scalable Biosequence Analysis,”
TeraGrid'11, Salt Lake City, Utah, July 2011.
(275.34 KB)
“Empirical Tuning of a Multiresolution Analysis Kernel using a Specialized Code Generator,”
ICL Technical Report, no. ICL-UT-07-02, January 2007.
(123.34 KB)
“An Effective Empirical Search Method for Automatic Software Tuning,”
ICL Technical Report, no. ICL-UT-05-02, January 2005.
(74.66 KB)
“The Impact of Paravirtualized Memory Hierarchy on Linear Algebra Computational Kernels and Software,”
ACM/IEEE International Symposium on High Performance Distributed Computing, Boston, MA., June 2008.
(403.89 KB)
“Paravirtualization Effect on Single- and Multi-threaded Memory-Intensive Linear Algebra Software,”
Cluster Computing Journal: Special Issue on High Performance Distributed Computing, vol. 12, no. 2: Springer Netherlands, pp. 101-122, 00 2009.
(451.07 KB)
“Solving Linear Diophantine Systems on Parallel Architectures,”
IEEE Transactions on Parallel and Distributed Systems, vol. 30, issue 5, pp. 1158-1169, May 2019.
(802.97 KB)
“Docker Container based PaaS Cloud Computing Comprehensive Benchmarks using LAPACK,”
Computer Modeling and Intelligent Systems CMIS-2020, Zaporizhzhoa, March 2020.
(451.33 KB)
“Efficient Communications in Training Large Scale Neural Networks,”
ACM MultiMedia Workshop 2017, Mountain View, CA, ACM, October 2017.
(1.41 MB)
“Using Arm Scalable Vector Extension to Optimize Open MPI,”
20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020), Melbourne, Australia, IEEE/ACM, May 2020.
(359.95 KB)
“Runtime Level Failure Detection and Propagation in HPC Systems,”
European MPI Users' Group Meeting (EuroMPI '19), Zürich, Switzerland, ACM, September 2019.
(1.11 MB)
“Using Advanced Vector Extensions AVX-512 for MPI Reduction (Poster)
, Austin, TX, EuroMPI/USA '20: 27th European MPI Users' Group Meeting, September 2020.
(708.68 KB)
Using Advanced Vector Extensions AVX-512 for MPI Reduction,”
EuroMPI/USA '20: 27th European MPI Users' Group Meeting, Austin, TX, September 2020.
(634.45 KB)
“Using long vector extensions for MPI reductions,”
Parallel Computing, vol. 109, pp. 102871, March 2022.
“Predicting the electronic properties of 3D, million-atom semiconductor nanostructure architectures,”
J. Phys.: Conf. Ser. 46, vol. :101088/1742-6596/46/1/040, pp. 292-298, January 2006.
(644.1 KB)
“