Publications

1999

Casanova, H., J. Plank, M. Beck, and J. Dongarra, “Deploying Fault-tolerance and Task Migration with NetSolve,” Future Generation Computer Systems, vol. 15, no. 5-6: Elsevier, pp. 745-755, October 1999.

(236 KB)

2000

Raman, G., and J. Dongarra, “Design and Implementation of NetSolve using DCOM as the Remoting Layer,” University of Tennessee Computer Science Department Technical Report, no. UT-CS-00-440, May 2000.

(65.45 KB)

D'Azevedo, E., and J. Dongarra, “The Design and Implementation of the Parallel Out of Core ScaLAPACK LU, QR, and Cholesky Factorization Routines,” Concurrency: Practice and Experience, vol. 12, no. 15, pp. 1481-1493, January 2000.

(374.18 KB)

Arnold, D., and J. Dongarra, “Developing an Architecture to Support the Implementation and Development of Scientific Computing Applications,” to appear in Proceedings of Working Conference 8: Software Architecture for Scientific Computing Applications, Ottawa, Canada, October 2000.

(176.25 KB)

2002

Roche, K., and J. Dongarra, “Deploying Parallel Numerical Library Routines to Cluster Computing in a Self Adapting Fashion,” Parallel Computing: Advances and Current Issues:Proceedings of the International Conference ParCo2001, London, England, Imperial College Press, January 2002.

(381.89 KB)

Kelleher, Jr., M., “Development of the PICMSS NetSolve Service,” ICL Technical Report, no. ICL-UT-02-04, April 2002.

(328.44 KB)

2003

Hiroyasu, T., M. Miki, M. Sano, H. Shimosaka, S. Tsutsui, and J. Dongarra, “Distributed Probablistic Model-Building Genetic Algorithm,” Lecture Notes in Computer Science, vol. 2723: Springer-Verlag, Heidelberg, pp. 1015-1028, January 2003.

(288.91 KB)

Boehmann, T. B., “Distributed Storage in RIB,” ICL Tech Report, no. ICL-UT-03-01, March 2003.

(213.02 KB)

2004

Luszczek, P., and J. Dongarra, “Design of an Interactive Environment for Numerically Intensive Parallel Linear Algebra Calculations,” International Conference on Computational Science, Poland, Springer Verlag, June 2004.

(88.31 KB)

2005

Cronk, D., G. Fagg, S. Emeny, and S. Tucker, “Dynamic Process Management for Pipelined Applications,” Proceedings of DoD HPCMP UGC 2005 (to appear), Nashville, TN, IEEE, January 2005.

2007

Pjesivac–Grbovic, J., G. Bosilca, G. Fagg, T. Angskun, and J. Dongarra, “Decision Trees and MPI Collective Algorithm Selection Problem,” Euro-Par 2007, Rennes, France, Springer, pp. 105–115, August 2007.

(552.94 KB)

Dongarra, J., Z. Chen, G. Bosilca, and J. Langou, “Disaster Survival Guide in Petascale Computing: An Algorithmic Approach,” in Petascale Computing: Algorithms and Applications (to appear): Chapman & Hall - CRC Press, 00 2007.

(260.18 KB)

2008

Dongarra, J., R. Graybill, W. Harrod, R. Lucas, E. Lusk, P. Luszczek, J. McMahon, A. Snavely, J. Vetter, K. Yelick, et al., “DARPA's HPCS Program: History, Models, Tools, Languages,” in Advances in Computers, vol. 72: Elsevier, January 2008.

(3.61 MB)

Fürlinger, K., and S. Moore, “Detection and Analysis of Iterative Behavior in Parallel Applications,” Proceedings of the 2008 International Conference on Computational Science (ICCS 2008), vol. 5103, Krakow, Poland, pp. 261-267, January 2008.

(141.02 KB)

2009

Kurzak, J., H. Ltaeif, J. Dongarra, and R. M. Badia, “Dependency-Driven Scheduling of Dense Matrix Factorizations on Shared-Memory Systems,” PPAM 2009, Poland, September 2009.

Song, F., A. YarKhan, and J. Dongarra, “Dynamic Task Scheduling for Linear Algebra Algorithms on Distributed-Memory Multicore Systems,” International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '09), Portland, OR, November 2009.

(502.49 KB)

2010

Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Lemariner, and J. Dongarra, “DAGuE: A generic distributed DAG engine for high performance computing,” Innovative Computing Laboratory Technical Report, no. ICL-UT-10-01, April 2010.

(830.85 KB)

Tomov, S., and J. Dongarra, “Dense Linear Algebra for Hybrid GPU-based Systems,” Scientific Computing with Multicore and Accelerators, Boca Raton, Florida, CRC Press, 2010.

Tomov, S., Dense Linear Algebra Solvers for Multicore with GPU Accelerators , Atlanta, GA, International Parallel and Distributed Processing Symposium (IPDPS 2010), April 2010.

(956.68 KB)

Tomov, S., R. Nath, H. Ltaeif, and J. Dongarra, “Dense Linear Algebra Solvers for Multicore with GPU Accelerators,” Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, Atlanta, GA, pp. 1-8, 2010.

(1 MB)

Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemariner, H. Ltaeif, et al., “Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,” University of Tennessee Computer Science Technical Report, UT-CS-10-660, September 2010.

(366.26 KB)

Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemariner, H. Ltaeif, et al., “Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project,” Innovative Computing Laboratory Technical Report, no. ICL-UT-10-02, 00 2010.

(400.75 KB)

Voemel, C., S. Tomov, and J. Dongarra, “Divide & Conquer on Hybrid GPU-Accelerated Multicore Systems,” SIAM Journal on Scientific Computing (submitted), August 2010.

Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, and J. Dongarra, “Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols,” Proceedings of EuroMPI 2010, Stuttgart, Germany, Springer, September 2010.

(202.87 KB)

2011

Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Lemariner, and J. Dongarra, “DAGuE: A Generic Distributed DAG Engine for High Performance Computing,” Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1151-1158, 00 2011.

(830.85 KB)

You, H., Q. Liu, Z. Li, and S. Moore, “The Design of an Auto-tuning I/O Framework on Cray XT5 System,” Cray Users Group Conference (CUG'11) (Best Paper Finalist), Fairbanks, Alaska, May 2011.

(459.57 KB)

2012

Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Lemariner, and J. Dongarra, “DAGuE: A generic distributed DAG Engine for High Performance Computing.,” Parallel Computing, vol. 38, no. 1-2: Elsevier, pp. 27-51, 00 2012.

(830.85 KB)

Dongarra, J., J. Kurzak, P. Luszczek, and S. Tomov, “Dense Linear Algebra on Accelerated Multicore Hardware,” High Performance Scientific Computing: Algorithms and Applications, London, UK, Springer-Verlag, 00 2012.

Voemel, C., S. Tomov, and J. Dongarra, “Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems,” SIAM Journal on Scientific Computing, vol. 34(2), pp. C70-C82, April 2012.

YarKhan, A., Dynamic Task Execution on Shared and Distributed Memory Architectures , 2012.

(3.29 MB)

2013

Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Luszczek, and J. Dongarra, “Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach,” Scalable Computing and Communications: Theory and Practice: John Wiley & Sons, pp. 699-735, March 2013.

(1.01 MB)

Faverge, M., J. Herrmann, J. Langou, B. Lowery, Y. Robert, and J. Dongarra, “Designing LU-QR hybrid solvers for performance and stability,” University of Tennessee Computer Science Technical Report (also LAWN 282), no. ut-eecs-13-719: University of Tennessee, October 2013.

(4.11 MB)

Marin, G., C. McCurdy, and J. Vetter, “Diagnosis and Optimization of Application Prefetching Performance,” Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013.

(827.31 KB)

Donfack, S., S. Tomov, and J. Dongarra, “Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs,” University of Tennessee Computer Science Technical Report, no. ut-cs-13-713, July 2013.

(659.77 KB)

2014

Yamazaki, I., S. Tomov, and J. Dongarra, “Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, New Orleans, LA, November 2014.

(465.52 KB)

Yamazaki, I., J. Kurzak, P. Luszczek, and J. Dongarra, “Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime,” Workshop on Large-Scale Parallel Processing, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(398.16 KB)

Cao, C., T. Herault, G. Bosilca, and J. Dongarra, “Design for a Soft Error Resilient Dynamic Task-based Runtime,” ICL Technical Report, no. ICL-UT-14-04: University of Tennessee, November 2014.

(2.61 MB)

Faverge, M., J. Herrmann, J. Langou, B. Lowery, Y. Robert, and J. Dongarra, “Designing LU-QR Hybrid Solvers for Performance and Stability,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(4.2 MB)

Yamazaki, I., S. Rajamanickam, E. G. Boman, M. Hoemmen, M. A. Heroux, and S. Tomov, “Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 14), New Orleans, LA, IEEE, November 2014.

Donfack, S., S. Tomov, and J. Dongarra, “Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs,” Fourth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2014, May 2014.

(490.08 KB)

2015

Haidar, A., J. Kurzak, G. Pichon, and M. Faverge, “ A Data Flow Divide and Conquer Algorithm for Multicore Architecture,” 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.

(535.44 KB)

Guidry, M., and A. Haidar, On the Design, Autotuning, and Optimization of GPU Kernels for Kinetic Network Simulations Using Fast Explicit Integration and GPU Batched Computation , Oak Ridge, TN, Joint Institute for Computational Sciences Seminar Series, Presentation, September 2015.

(17.25 MB)

Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, “On the Design, Development, and Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors,” ISC High Performance 2015, Frankfurt, Germany, July 2015.

(1.49 MB)

Cao, C., G. Bosilca, T. Herault, and J. Dongarra, “Design for a Soft Error Resilient Dynamic Task-based Runtime,” 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.

(2.31 MB)

2016

Baboulin, M., J. Dongarra, A. Remy, S. Tomov, and I. Yamazaki, “Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures,” Lecture Notes in Computer Science, vol. 9573: Springer International Publishing, pp. 86-95, September 2015, 2016.

(327.14 KB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures,” The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2016), IPDPS 2016, Chicago, IL, IEEE, May 2016.

(708.62 KB)

Anzt, H., E. Chow, D. Szyld, and J. Dongarra, “Domain Overlap for Iterative Sparse Triangular Solves on GPUs,” Software for Exascale Computing - SPPEXA, vol. 113: Springer International Publishing, pp. 527–545, September 2016.

2017

Jagode, H., “Dataflow Programming Paradigms for Computational Chemistry Methods,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-01, Knoxville, TN, University of Tennessee, May 2017.

Kurzak, J., P. Luszczek, I. Yamazaki, Y. Robert, and J. Dongarra, “Design and Implementation of the PULSAR Programming System for Large Scale Computing,” Supercomputing Frontiers and Innovations, vol. 4, issue 1, 2017.

(764.96 KB)

Dongarra, J., S. Hammarling, N. J. Higham, S. Relton, P. Valero-Lara, and M. Zounon, “The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems,” International Conference on Computational Science (ICCS 2017), Zürich, Switzerland, Elsevier, June 2017.

(446.14 KB)

Main menu

Publications

Pages