Publications
Power Monitoring with PAPI for Extreme Scale Architectures and Dataflow-based Programming Models,”
2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-04, Madrid, Spain, IEEE, September 2014.
DOI: 10.1109/CLUSTER.2014.6968672 (3.45 MB)
“PTG: An Abstraction for Unhindered Parallelism,”
International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), New Orleans, LA, IEEE Press, November 2014.
(480.05 KB)
“Self-Adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures,”
VECPAR 2014, Eugene, OR, June 2014.
(430.56 KB)
“Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes,”
23rd International Heterogeneity in Computing Workshop, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
(807.33 KB)
“Task-Based Programming for Seismic Imaging: Preliminary Results,”
2014 IEEE International Conference on High Performance Computing and Communications (HPCC), Paris, France, IEEE, August 2014.
(625.86 KB)
“Utilizing Dataflow-based Execution for Coupled Cluster Methods,”
2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-02, Madrid, Spain, IEEE, September 2014.
(260.23 KB)
“Accelerating Linear System Solutions Using Randomization Techniques,”
ACM Transactions on Mathematical Software (also LAWN 246), vol. 39, issue 2, February 2013.
DOI: 10.1145/2427023.2427025 (358.79 KB)
“Beyond the CPU: Hardware Performance Counter Monitoring on Blue Gene/Q,”
International Supercomputing Conference 2013 (ISC'13), Leipzig, Germany, Springer, June 2013.
(624.58 KB)
“A Block-Asynchronous Relaxation Method for Graphics Processing Units,”
Journal of Parallel and Distributed Computing, vol. 73, issue 12, pp. 1613–1626, December 2013.
DOI: http://dx.doi.org/10.1016/j.jpdc.2013.05.008 (1.08 MB)
“Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach,”
Scalable Computing and Communications: Theory and Practice: John Wiley & Sons, pp. 699-735, March 2013.
(1.01 MB)
“Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems,”
Parallel Computing, vol. 39, issue 4-5, pp. 212-232, May 2013.
(1.43 MB)
“Hydrodynamic Computation with Hybrid Programming on CPU-GPU Clusters,”
University of Tennessee Computer Science Technical Report, no. ut-cs-13-714, July 2013.
(866.68 KB)
“Implementing a systolic algorithm for QR factorization on multicore clusters with PaRSEC,”
Lawn 277, no. UT-CS-13-709, May 2013.
(298.63 KB)
“An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware,”
University of Tennessee Computer Science Technical Report (also LAWN 283), no. ut-eecs-13-720: University of Tennessee, October 2013.
(1.23 MB)
“LU Factorization with Partial Pivoting for a Multicore System with Accelerators,”
IEEE Transactions on Parallel and Distributed Computing, vol. 24, issue 8, pp. 1613-1621, August 2013.
DOI: http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.242 (1.08 MB)
“Multithreading in the PLASMA Library,”
Multi and Many-Core Processing: Architecture, Programming, Algorithms, & Applications: Taylor & Francis, 00 2013.
(536.28 KB)
“