University of Tennessee Computer Science Technical Report

Yamazaki, I., S. Nooshabadi, S. Tomov, and J. Dongarra, “High Performance Realtime Convex Solver for Embedded Systems,” University of Tennessee Computer Science Technical Report, no. UT-EECS-16-745, October 2016.

(225.43 KB)

Dongarra, J., “Report on the Sunway TaihuLight System,” University of Tennessee Computer Science Technical Report, no. UT-EECS-16-742: University of Tennessee, June 2016.

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Performance, Design, and Autotuning of Batched GEMM for GPUs,” University of Tennessee Computer Science Technical Report, no. UT-EECS-16-739: University of Tennessee, February 2016.

(1.27 MB)

Abdelfattah, A., M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, et al., “High-Performance Tensor Contractions for GPUs,” University of Tennessee Computer Science Technical Report, no. UT-EECS-16-738: University of Tennessee, January 2016.

(2.36 MB)

Dongarra, J., M. A. Heroux, and P. Luszczek, “HPCG Benchmark: a New Metric for Ranking High Performance Computing Systems,” University of Tennessee Computer Science Technical Report , no. ut-eecs-15-736: University of Tennessee, January 2015.

Dongarra, J., T. Herault, and Y. Robert, “Fault Tolerance Techniques for High-performance Computing,” University of Tennessee Computer Science Technical Report (also LAWN 289), no. UT-EECS-15-734: University of Tennessee, May 2015.

Dongarra, J., J. Kurzak, P. Luszczek, and I. Yamazaki, “PULSAR Users’ Guide, Parallel Ultra-Light Systolic Array Runtime,” University of Tennessee EECS Technical Report, no. UT-EECS-14-733: University of Tennessee, November 2014.

(561.56 KB)

Anzt, H., S. Tomov, and J. Dongarra, “Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,” University of Tennessee Computer Science Technical Report, no. UT-EECS-14-731: University of Tennessee, October 2014.

(1.83 MB)

Anzt, H., S. Tomov, and J. Dongarra, “Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-σ formats on NVIDIA GPUs,” University of Tennessee Computer Science Technical Report, no. UT-EECS-14-727: University of Tennessee, April 2014.

(578.11 KB)

Faverge, M., J. Herrmann, J. Langou, B. Lowery, Y. Robert, and J. Dongarra, “Designing LU-QR hybrid solvers for performance and stability,” University of Tennessee Computer Science Technical Report (also LAWN 282), no. ut-eecs-13-719: University of Tennessee, October 2013.

(4.11 MB)

Aupy, G., A. Benoit, T. Herault, Y. Robert, and J. Dongarra, “Optimal Checkpointing Period: Time vs. Energy,” University of Tennessee Computer Science Technical Report (also LAWN 281), no. ut-eecs-13-718: University of Tennessee, October 2013.

(440.13 KB)

Moore, K., and J. Dongarra, “NetBuild,” University of Tennessee Computer Science Technical Report, no. UT-CS-O1-461, January 2001.

(17.71 KB)

Eijkhout, V., “The 'Weighted Modification' Incomplete Factorisation Method,” University of Tennessee Computer Science Department Technical Report, no. UT-CS-99-436, December 1999.

(198.71 KB)

Eijkhout, V., “On the Existence Problem of Incomplete Factorisation Methods,” University of Tennessee Computer Science Department Technical Report, no. UT-CS-99-435, December 1999.

(222.2 KB)

Dongarra, J., H. Meuer, and E. Strohmaier, “Top500 Supercomputer Sites (14th edition),” University of Tennessee Computer Science Department Technical Report, no. UT-CS-99-434, November 1999.

(281.81 KB)

Elwasif, W., M. Beck, and J. Plank, “IBP - Internet Backplane Protocol: Infrastructure for Distributed Storage (V O.2),” University of Tennessee Computer Science Department Technical Report, no. UT-CS-99-430, February 1999.

(37.72 KB)

Dongarra, J., H. Meuer, and E. Strohmaier, “Top500 Supercomputer Sites (13th edition),” University of Tennessee Computer Science Department Technical Report, no. UT-CS-99-425, June 1999.

(278.51 KB)

Arbenz, P., A. Cleary, J. Dongarra, and M. Hegland, “A Comparison of Parallel Solvers for Diagonally Dominant and General Narrow Banded Linear Systems II (LAPACK Working Note 143),” University of Tennessee Computer Science Department Technical Report, no. UT-CS-99-415, January 1999.

(174.46 KB)

Arbenz, P., A. Cleary, J. Dongarra, and M. Hegland, “A Comparison of Parallel Solvers for General Narrow Banded Linear Systems (LAPACK Working Note 142),” University of Tennessee Computer Science Technical Report, no. UT-CS-99-414, January 1999.

(304.96 KB)

Donfack, S., J. Dongarra, M. Faverge, M. Gates, J. Kurzak, P. Luszczek, and I. Yamazaki, “On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties,” University of Tennessee Computer Science Technical Report, no. UT-CS-13-715, July 2013, 2012.

(358.98 KB)

Donfack, S., S. Tomov, and J. Dongarra, “Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs,” University of Tennessee Computer Science Technical Report, no. ut-cs-13-713, July 2013.

(659.77 KB)

Cao, C., J. Dongarra, P. Du, M. Gates, P. Luszczek, and S. Tomov, “clMAGMA: High Performance Dense Linear Algebra with OpenCL,” University of Tennessee Technical Report (Lawn 275), no. UT-CS-13-706: University of Tennessee, March 2013.

(526.6 KB)

Dongarra, J., T. Herault, and Y. Robert, “Revisiting the Double Checkpointing Algorithm,” University of Tennessee Computer Science Technical Report (LAWN 274), no. ut-cs-13-705, January 2013.

(682.22 KB)

Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, “Extending the Scope of the Checkpoint-on-Failure Protocol for Forward Recovery in Standard MPI,” University of Tennessee Computer Science Technical Report, no. ut-cs-12-702, 00 2012.

(422.76 KB)

Donfack, S., S. Tomov, and J. Dongarra, “Performance evaluation of LU factorization through hardware counter measurements,” University of Tennessee Computer Science Technical Report, no. ut-cs-12-700, October 2012.

(794.82 KB)

Langou, J., B. Hoffman, and B. King, “How LAPACK library enables Microsoft Visual Studio support with CMake and LAPACKE,” University of Tennessee Computer Science Technical Report (also LAWN 270), no. UT-CS-12-698, July 2012.

(501.53 KB)

Bosilca, G., A. Bouteiller, E. Brunet, F. Cappello, J. Dongarra, A. Guermouche, T. Herault, Y. Robert, F. Vivien, and D. Zaidouni, “Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,” University of Tennessee Computer Science Technical Report (also LAWN 269), no. UT-CS-12-697, June 2012.

(2.76 MB)

Bland, W., G. Bosilca, A. Bouteiller, T. Herault, and J. Dongarra, “A Proposal for User-Level Failure Mitigation in the MPI-3 Standard,” University of Tennessee Electrical Engineering and Computer Science Technical Report, no. ut-cs-12-693: University of Tennessee, February 2012.

(159.46 KB)

Anzt, H., S. Tomov, M. Gates, J. Dongarra, and V. Heuveline, Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems , no. UT-CS-11-689, December 2011.

(608.95 KB)

Anzt, H., S. Tomov, J. Dongarra, and V. Heuveline, “A Block-Asynchronous Relaxation Method for Graphics Processing Units,” University of Tennessee Computer Science Technical Report, no. UT-CS-11-687 / LAWN 258, November 2011.

(1.08 MB)

Du, P., A. Bouteiller, G. Bosilca, T. Herault, and J. Dongarra, “Algorithm-based Fault Tolerance for Dense Matrix Factorizations,” University of Tennessee Computer Science Technical Report, no. UT-CS-11-676, Knoxville, TN, August 2011.

(865.79 KB)

Du, P., P. Luszczek, S. Tomov, and J. Dongarra, “Soft Error Resilient QR Factorization for Hybrid System,” University of Tennessee Computer Science Technical Report, no. UT-CS-11-675, Knoxville, TN, July 2011.

(1.39 MB)

Dongarra, J., and P. Luszczek, “Reducing the time to tune parallel dense linear algebra routines with partial execution and performance modelling,” University of Tennessee Computer Science Technical Report, no. UT-CS-10-661, October 2010.

(287.87 KB)

Jagode, H., J. Hein, and A. Trew, “Task placement of parallel multi-dimensional FFTs on a mesh communication network,” University of Tennessee Computer Science Technical Report, no. UT-CS-08-613, January 2008.

(2.33 MB)

Kurzak, J., A. Buttari, P. Luszczek, and J. Dongarra, “The PlayStation 3 for High Performance Scientific Computing,” University of Tennessee Computer Science Technical Report, no. UT-CS-08-608, January 2008.

(2.45 MB)

Baboulin, M., J. Dongarra, S. Gratton, and J. Langou, “Computing the Conditioning of the Components of a Linear Least Squares Solution,” University of Tennessee Computer Science Technical Report, no. UT-CS-07-604, (also LAPACK Working Note 193), January 2007.

(374.97 KB)

Buttari, A., J. Langou, J. Kurzak, and J. Dongarra, “A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,” University of Tennessee Computer Science Technical Report, no. UT-CS-07-600 (also LAPACK Working Note 191), January 2007.

(274.74 KB)

Kurzak, J., A. Buttari, and J. Dongarra, “Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization,” UT Computer Science Technical Report (Also LAPACK Working Note 184), no. UT-CS-07-596, January 2007.

(751.57 KB)

Song, F., S. Moore, and J. Dongarra, “Modeling of L2 Cache Behavior for Thread-Parallel Scientific Programs on Chip Multi-Processors,” University of Tennessee Computer Science Technical Report, no. UT-CS-06-583, January 2006.

(652.93 KB)

Kurzak, J., and J. Dongarra, “Implementation of the Mixed-Precision High Performance LINPACK Benchmark on the CELL Processor,” University of Tennessee Computer Science Tech Report, no. UT-CS-06-580, LAPACK Working Note #177, September 2006.

(506.18 KB)

Langou, J., J. Langou, P. Luszczek, J. Kurzak, A. Buttari, and J. Dongarra, “Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy,” University of Tennessee Computer Science Tech Report, no. UT-CS-06-574, LAPACK Working Note #175, April 2006.

(221.39 KB)

Bassi, A., M. Beck, J. Plank, and R. Wolski, “Internet Backplane Protocol: API 1.0,” University of Tennessee Computer Science Technical Report, no. UT-CS-01-464, January 2001.

(55.33 KB)

Bassi, A., and X. Li, “Internet Backplane Protocol - Test Language v. 1.0,” University of Tennessee Computer Science Technical Report, no. UT-CS-01-464, January 2001.

(22.43 KB)

Petitet, A., S. Blackford, J. Dongarra, B. Ellis, G. Fagg, K. Roche, and S. Vadhiyar, “Numerical Libraries and The Grid: The Grads Experiments with ScaLAPACK,” University of Tennessee Computer Science Technical Report, no. UT-CS-01-460, January 2001.

(91.78 KB)

Eijkhout, V., “Automatic Determination of Matrix-Blocks,” Lapack Working Note 151, University of Tennessee Computer Science Technical Report, no. UT-CS-01-458, January 2001.

(1.15 MB)

Whaley, C., A. Petitet, and J. Dongarra, “Automated Empirical Optimizations of Software and the ATLAS Project (LAPACK Working Note 147),” University of Tennessee Computer Science Department Technical Report,, no. UT-CS-00-448, September 2000.

(373.69 KB)

Cronk, D., B. Ellis, and G. Fagg, “Metacomputing: An Evaluation of Emerging Systems,” University of Tennessee Computer Science Department Technical Report, no. UT-CS-00-445, July 2000.

(280.21 KB)

Dongarra, J., H. Meuer, and E. Strohmaier, “Top500 Supercomputer Sites (15th edition),” University of Tennessee Computer Science Department Technical Report, no. UT-CS-00-442, June 2000.

(278.88 KB)

Raman, G., and J. Dongarra, “Design and Implementation of NetSolve using DCOM as the Remoting Layer,” University of Tennessee Computer Science Department Technical Report, no. UT-CS-00-440, May 2000.

(65.45 KB)

Millar, J., P. McMahan, and J. Dongarra, “RIBAPI - Repository in a Box Application Programmer's Interface,” University of Tennessee Computer Science Technical Report, no. UT-CS-00-438, 00 2001.

(57.5 KB)

Dempsey, B., and D. Weiss, “Towards An Efficient, Scalable Replication Mechanism for the I2-DSI Project,” University of North Carolina School of Library and Information Science Technical Report, no. TR-1999-01, January 1999.

Anzt, H., E. Boman, J. Dongarra, G. Flegar, M. Gates, M. Heroux, M. Hoemmen, J. Kurzak, P. Luszczek, S. Rajamanickam, et al., “MAGMA-sparse Interface Design Whitepaper,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-05, September 2017.

(1.28 MB)

Dongarra, J., “Report on the TianHe-2A System,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-04: University of Tennessee, September 2017.

(7.15 MB)

Jagode, H., “Dataflow Programming Paradigms for Computational Chemistry Methods,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-01, Knoxville, TN, University of Tennessee, May 2017.

Aupy, G., and Y. Robert, “Scheduling for fault-tolerance: an introduction,” Innovative Computing Laboratory Technical Report, no. ICL-UT-15-02: University of Tennessee, January 2015.

(416.37 KB)

Herault, T., A. Bouteiller, G. Bosilca, M. Gamell, K. Teranishi, M. Parashar, and J. Dongarra, “Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems: Formal Proof,” Innovative Computing Laboratory Technical Report, no. ICL-UT-15-01, April 2015.

(570.97 KB)

Benoit, A., Y. Robert, and S. K. Raina, “Efficient checkpoint/verification patterns for silent error detection,” Innovative Computing Laboratory Technical Report, no. ICL-UT-14-03: University of Tennessee, May 2014.

(397.75 KB)

Marin, G., “Performance Analysis of the MPAS-Ocean Code using HPCToolkit and MIAMI,” ICL Technical Report, no. ICL-UT-14-01: University of Tennessee, February 2014.

(894.39 KB)

Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, “Assessing the impact of ABFT and Checkpoint composite strategies,” University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.

(968.47 KB)

Nelson, J., “Analyzing PAPI Performance on Virtual Machines,” ICL Technical Report, no. ICL-UT-13-02, August 2013.

(437.37 KB)

Bouteiller, A., F. Cappello, J. Dongarra, A. Guermouche, T. Herault, and Y. Robert, “Multi-criteria checkpointing strategies: optimizing response-time versus resource utilization,” University of Tennessee Computer Science Technical Report, no. ICL-UT-13-01, February 2013.

(497.64 KB)

McCraw, H., “Performance Counter Monitoring for the Blue Gene/Q Architecture,” University of Tennessee Computer Science Technical Report, no. ICL-UT-12-01, 00 2012.

(92.5 KB)

Becker, D., M. Baboulin, and J. Dongarra, “Reducing the Amount of Pivoting in Symmetric Indefinite Systems,” University of Tennessee Innovative Computing Laboratory Technical Report, no. ICL-UT-11-06, Knoxville, TN, Submitted to PPAM 2011, May 2011.

(145.76 KB)

Bosilca, G., T. Herault, A. Rezmerita, and J. Dongarra, “On Scalability for MPI Runtime Systems,” University of Tennessee Computer Science Technical Report, no. ICL-UT-11-05, Knoxville, TN, May 2011.

(898.76 KB)

Dongarra, J., M. Faverge, Y. Ishikawa, R. Namyst, F. Rue, and F. Trahay, “EZTrace: a generic framework for performance analysis,” ICL Technical Report, no. ICL-UT-11-01, December 2010.

Luszczek, P., and J. Dongarra, “Analysis of Various Scalar, Vector, and Parallel Implementations of RandomAccess,” Innovative Computing Laboratory (ICL) Technical Report, no. ICL-UT-10-03, June 2010.

(226.9 KB)

Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Lemariner, and J. Dongarra, “DAGuE: A generic distributed DAG engine for high performance computing,” Innovative Computing Laboratory Technical Report, no. ICL-UT-10-01, April 2010.

(830.85 KB)

Bosilca, G., C. Coti, T. Herault, P. Lemariner, and J. Dongarra, “Constructing resiliant communication infrastructure for runtime environments,” Innovative Computing Laboratory Technical Report, no. ICL-UT-09-02, July 2009.

(463.71 KB)

Jagode, H., A. Knuepfer, J. Dongarra, M. Jurenz, M. S. Mueller, and W. E. Nagel, “Trace-based Performance Analysis for the Petascale Simulation Code FLASH,” Innovative Computing Laboratory Technical Report, no. ICL-UT-09-01, April 2009.

(887.54 KB)

Li, Y., and J. Dongarra, “Request Sequencing: Enabling Workflow for Efficient Parallel Problem Solving in GridSolve,” ICL Technical Report, no. ICL-UT-08-01, April 2008.

(1.64 MB)

You, H., K. Seymour, J. Dongarra, and S. Moore, “Empirical Tuning of a Multiresolution Analysis Kernel using a Specialized Code Generator,” ICL Technical Report, no. ICL-UT-07-02, January 2007.

(123.34 KB)

You, H., K. Seymour, J. Dongarra, and S. Moore, “Automated Empirical Tuning of a Multiresolution Analysis Kernel,” ICL Technical Report, no. ICL-UT-07-01, pp. 10, January 2007.

(120.7 KB)

Dewolfs, D., J. Broeckhove, V. Sunderam, and G. Fagg, “FT-MPI, Fault-Tolerant Metacomputing and Generic Name Services: A Case Study,” Lecture Notes in Computer Science, vol. 4192, no. ICL-UT-06-14: Springer Berlin / Heidelberg, pp. 133-140, 00 2006.

(362.44 KB)

Pjesivac–Grbovic, J., G. Fagg, T. Angskun, G. Bosilca, and J. Dongarra, “MPI Collective Algorithm Selection and Quadtree Encoding,” Lecture Notes in Computer Science, vol. 4192, no. ICL-UT-06-13: Springer Berlin / Heidelberg, pp. 40-48, September 2006.

(308.39 KB)

Angskun, T., G. Fagg, G. Bosilca, J. Pjesivac–Grbovic, and J. Dongarra, “Scalable Fault Tolerant Protocol for Parallel Runtime Environments,” 2006 Euro PVM/MPI, no. ICL-UT-06-12, Bonn, Germany, 00 2006.

(149.07 KB)

Pjesivac–Grbovic, J., G. Fagg, T. Angskun, G. Bosilca, and J. Dongarra, “MPI Collective Algorithm Selection and Quadtree Encoding,” ICL Technical Report, no. ICL-UT-06-11, 00 2006.

(308.39 KB)

Seymour, K., H. You, and J. Dongarra, “ATLAS on the BlueGene/L – Preliminary Results,” ICL Technical Report, no. ICL-UT-06-10, January 2006.

(46.19 KB)

Tang, Y., “Technical Comparison between several representative checkpoint/rollback solutions for MPI programs,” ICL Technical Report, no. ICL-UT-06-09, January 2006.

(84.67 KB)

Wolf, F., F. Freitag, B. Mohr, S. Moore, and B. Wylie, “Large Event Traces in Parallel Performance Analysis,” 8th Workshop 'Parallel Systems and Algorithms' (PASA), Lecture Notes in Informatics, no. ICL-UT-06-08, Frankfurt/Main, Germany, Gesellschaft für Informatik, March 2006.

(92.47 KB)

Mohr, B., A. Kühnal, M-A. Hermanns, and F. Wolf, “Performance Analysis of One-sided Communication Mechanisms,” Mini-Symposium "Tools Support for Parallel Programming", Proceedings of Parallel Computing (ParCo), no. ICL-UT-06-07, Malaga, Spain, September 2005.

(121.49 KB)

Browne, S., P. McMahan, and S. Wells, “Repository in a Box Toolkit for Software and Resource Sharing,” University of Tennessee Computer Science Department Technical Report, no. ICL-UT-05-05, 00 2001.

(195.96 KB)

Meek, E., J. Larkin, and J. Dongarra, “Remote Software Toolkit Installer,” ICL Technical Report, no. ICL-UT-05-04, June 2005.

(490.6 KB)

You, H., K. Seymour, and J. Dongarra, “An Effective Empirical Search Method for Automatic Software Tuning,” ICL Technical Report, no. ICL-UT-05-02, January 2005.

(74.66 KB)

Buttari, A., V. Eijkhout, J. Langou, and S. Filippone, “Performance Optimization and Modeling of Blocked Sparse Kernels,” ICL Technical Report, no. ICL-UT-04-05, 00 2004.

(229.58 KB)

Bosilca, G., Z. Chen, J. Dongarra, and J. Langou, “Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,” ICL Technical Report, no. ICL-UT-04-04, January 2004.

(241.36 KB)

Wolf, F., “EARL - API Documentation,” ICL Technical Report, no. ICL-UT-04-03, October 2004.

(111.36 KB)

Moore, K., J. Dongarra, S. Moore, and E. Grosse, “NetBuild: Automated Installation and Use of Network-Accessible Software Libraries,” ICL Technical Report, no. ICL-UT-04-02, January 2004.

(80.52 KB)

Song, F., and F. Wolf, “CUBE User Manual,” ICL Technical Report, no. ICL-UT-04-01, February 2004.

(429.12 KB)

Eijkhout, V., and E. Fuentes, “A Proposed Standard for Matrix Metadata,” Innovative Computing Laboratory Technical Report, no. ICL-UT-03-02, Submitted to ACM TOMS, November 2003.

(13.39 KB)

Boehmann, T. B., “Distributed Storage in RIB,” ICL Tech Report, no. ICL-UT-03-01, March 2003.

(213.02 KB)

Seymour, K., H. Nakada, S. Matsuoka, J. Dongarra, C. Lee, and H. Casanova, “GridRPC: A Remote Procedure Call API for Grid Computing,” ICL Technical Report, no. ICL-UT-02-06, November 2002.

(287.73 KB)

Agrawal, S., D. Arnold, S. Blackford, J. Dongarra, M. Miller, K. Sagi, Z. Shi, K. Seymour, and S. Vadhiyar, “Users' Guide to NetSolve v1.4.1,” ICL Technical Report, no. ICL-UT-02-05, June 2002.

(328.01 KB)

Kelleher, Jr., M., “Development of the PICMSS NetSolve Service,” ICL Technical Report, no. ICL-UT-02-04, April 2002.

(328.44 KB)

Eijkhout, V., “Polynomial Acceleration of Optimised Multi-grid Smoothers; Basic Theory,” ICL Technical Report, vol. 156, no. ICL-UT-02-03, January 2002.

(100.66 KB)

Agrawal, S., “Hardware Software Server in NetSolve,” ICL Technical Report, no. ICL-UT-02-02, January 2002.

(221.4 KB)

Du, P., P. Luszczek, S. Tomov, and J. Dongarra, “Soft Error Resilient QR Factorization for Hybrid System,” UT-CS-11-675 (also LAPACK Working Note #252), no. ICL-CS-11-675, July 2011.

(1.39 MB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software, (Linpack Benchmark Report),” University of Tennessee Computer Science Technical Report, no. CS-89-85: University of Tennessee, June 2014.

(514.64 KB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software,” University of Tennessee Computer Science Technical Report, no. cs-89-85, February 2013.

(539.24 KB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),” University of Tennessee Computer Science Technical Report, no. CS-89-85, 00 2011.

(6.42 MB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),” University of Tennessee Computer Science Department Technical Report, no. CS-89-85, January 2000.

(354.1 KB)

Cunha, M., J. Telles, A. YarKhan, and J. Dongarra, “Grid Computing applied to the Boundary Element Method,” Proceedings of the First International Conference on Parallel, Distributed and Grid Computing for Engineering, vol. 27, no. :104203/9027, Stirlingshire, UK, Civil-Comp Press, 00 2009.

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Performance, Design, and Autotuning of Batched GEMM for GPUs,” High Performance Computing: 31st International Conference, ISC High Performance 2016, Frankfurt, Germany, June 19-23, 2016, Proceedings, no. 9697: Springer International Publishing, pp. 21–38, 2016. DOI: 10.1007/978-3-319-41321-1_2

(1.98 MB)

Kurzak, J., A. Buttari, and J. Dongarra, “Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization,” IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 9, pp. 1-11, January 2008.

(751.57 KB)

Dongarra, J., H. Meuer, H. D. Simon, and E. Strohmaier, “Biannual Top-500 Computer Lists Track Changing Environments for Scientific Computing,” SIAM News, vol. 34, no. 9, October 2002.

(2.62 MB)

Fagg, G., and J. Dongarra, “HARNESS Fault Tolerant MPI Design, Usage and Performance Issues,” Future Generation Computer Systems, vol. 18, no. 8, pp. 1127-1142, January 2002.

(403.41 KB)

Reed, D., and J. Dongarra, “ Exascale Computing and Big Data,” Communications of the ACM, vol. 58, no. 7: ACM, pp. 56-68, July 2015. DOI: 10.1145/2699414

(7.3 MB)

Baboulin, M., J. Dongarra, S. Gratton, and J. Langou, “Computing the Conditioning of the Components of a Linear Least-squares Solution,” Numerical Linear Algebra with Applications, vol. 16, no. 7, pp. 517-533, 00 2009.

(374.97 KB)

Dongarra, J., J-F. Pineau, Y. Robert, Z. Shi, and F. Vivien, “Revisiting Matrix Product on Master-Worker Platforms,” International Journal of Foundations of Computer Science (IJFCS), vol. 19, no. 6, pp. 1317-1336, December 2008.

(248.66 KB)

YarKhan, A., and J. Dongarra, “Biological Sequence Alignment on the Computational Grid Using the GrADS Framework,” Future Generation Computing Systems, vol. 21, no. 6: Elsevier, pp. 980-986, June 2005.

(147.29 KB)

Berry, M., and J. Dongarra, “Atlanta Organizers Put Mathematics to Work For the Math Sciences Community,” SIAM News, vol. 32, no. 6, January 1999.

(45.98 KB)

“Computational Science – ICCS 2009, Proceedings of the 9th International Conference,” Lecture Notes in Computer Science: Theoretical Computer Science and General Issues, vol. -, no. 5544-5545, Baton Rouge, LA, May 2009.

Fagg, G., K. Moore, and J. Dongarra, “Scalable Networked Information Processing Environment (SNIPE),” Journal on Future Generation Computer Systems, vol. 15, no. 5/6, pp. 595-605, January 1999.

(189.21 KB)

Giraud, L., A. Haidar, and S. Pralet, “Using multiple levels of parallelism to enhance the performance of domain decomposition solvers,” Parallel Computing, vol. 36, no. 5-6: Elsevier journals, pp. 285-296, 00 2010.

(418.57 KB)

Casanova, H., J. Plank, M. Beck, and J. Dongarra, “Deploying Fault-tolerance and Task Migration with NetSolve,” Future Generation Computer Systems, vol. 15, no. 5-6: Elsevier, pp. 745-755, October 1999.

(236 KB)

Beck, M., J. Dongarra, G. Fagg, A. Geist, P. Gray, J. Kohl, M. Migliardi, K. Moore, T. Moore, P. Papadopoulous, et al., “HARNESS: A Next Generation Distributed Virtual Machine,” International Journal on Future Generation Computer Systems, vol. 15, no. 5-6, pp. 571-582, January 1999.

(183.78 KB)

Boulet, P., J. Dongarra, Y. Robert, and F. Vivien, “Static Tiling for Heterogeneous Computing Platforms,” Parallel Computing, vol. 25, no. 5, pp. 547-568, January 1999.

(301.17 KB)

Agullo, E., C. Coti, T. Herault, J. Langou, S. Peyronnet, A. Rezmerita, F. Cappello, and J. Dongarra, “QCG-OMPI: MPI Applications on Grids.,” Future Generation Computer Systems, vol. 27, no. 4, pp. 435-369, January 2011.

(1.48 MB)

Agullo, E., C. Coti, T. Herault, J. Langou, S. Peyronnet, A. Rezmerita, F. Cappello, and J. Dongarra, “QCG-OMPI: MPI Applications on Grids,” Future Generation Computer Systems, vol. 27, no. 4, pp. 357-369, March 2010.

(1.48 MB)

Dongarra, J., “Trends in High Performance Computing,” The Computer Journal, vol. 47, no. 4: The British Computer Society, pp. 399-403, 00 2004.

(455.96 KB)

Moore, S., A.J. Baker, J. Dongarra, C. Halloy, and C. Ng, “Active Netlib: An Active Mathematical Software Collection for Inquiry-based Computational Science and Engineering Education,” Journal of Digital Information special issue on Interactivity in Digital Libraries, vol. 2, no. 4, 00 2002.

(182.59 KB)

Berman, F., A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, J. Mellor-Crummey, et al., “The GrADS Project: Software Support for High-Level Grid Application Development,” International Journal of High Performance Applications and Supercomputing, vol. 15, no. 4, pp. 327-344, January 2001.

(271.52 KB)

Petitet, A., S. Blackford, J. Dongarra, B. Ellis, G. Fagg, K. Roche, and S. Vadhiyar, “Numerical Libraries and The Grid,” International Journal of High Performance Applications and Supercomputing, vol. 15, no. 4, pp. 359-374, January 2001.

(67.09 KB)

Dongarra, J., V. Eijkhout, and H. van der Vorst, “Iterative Solver Benchmark (LAPACK Working Note 152),” Scientific Programming, vol. 9, no. 4, pp. 223-231, 00 2001.

(168.05 KB)

Tomov, S., J. Langou, J. Dongarra, A. Canning, and L-W. Wang, “Conjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures,” International Journal of Computational Science and Engineering, vol. 2, no. 3/4, pp. 205-212, 00 2006.

(428.21 KB)

Seymour, K., and J. Dongarra, “Automatic Translation of Fortran to JVM Bytecode,” Concurrency and Computation: Practice and Experience, vol. 15, no. 3-5, pp. 202-207, 00 2003.

(185.8 KB)

Anzt, H., E. Chow, J. Saak, and J. Dongarra, “Updating Incomplete Factorization Preconditioners for Model Order Reduction,” Numerical Algorithms, vol. 73, issue 3, no. 3, pp. 611–630, February 2016. DOI: 10.1007/s11075-016-0110-2

(565.34 KB)

Yamazaki, I., S. Tomov, and J. Dongarra, “Mixed-Precision Cholesky QR Factorization and its Case Studies on Multicore CPU with Multiple GPUs,” SIAM Journal on Scientific Computing, vol. 37, no. 3, pp. C203-C330, May 2015. DOI: DOI:10.1137/14M0973773

(374.8 KB)

Lively, C., X. Wu, V. Taylor, S. Moore, H-C. Chang, and K. Cameron, “Energy and performance characteristics of different parallel implementations of scientific applications on multicore systems,” International Journal of High Performance Computing Applications, vol. 25, no. 3, pp. 342-350, 00 2011.

(467.18 KB)

Angskun, T., G. Fagg, G. Bosilca, J. Pjesivac–Grbovic, and J. Dongarra, “Self-Healing Network for Scalable Fault-Tolerant Runtime Environments,” Future Generation Computer Systems, vol. 26, no. 3, pp. 479-485, March 2010.

(1.54 MB)

Giraud, L., A. Haidar, and Y. Saad, “Sparse approximations of the Schur complement for parallel algebraic hybrid solvers in 3D,” Numerical Mathematics: Theory, Methods and Applications, vol. 3, no. 3, Beijing, Golbal Science Press, pp. 64-82, 00 2010.

Fürlinger, K., and S. Moore, “Capturing and Analyzing the Execution Control Flow of OpenMP Applications,” International Journal of Parallel Programming, vol. 37, no. 3, pp. 266-276, 00 2009.

Dongarra, J., and P. Luszczek, “High Performance Development for High End Computing with Python Language Wrapper (PLW),” International Journal for High Performance Computer Applications, vol. 21, no. 3, pp. 360-369, 00 2007.

(179.32 KB)

Casanova, H., M H. Kim, J. Plank, and J. Dongarra, “Adaptive Scheduling for Task Farming with Grid Middleware,” International Journal of Supercomputer Applications and High-Performance Computing, vol. 13, no. 3, pp. 231-240, October 2002.

(461.08 KB)

Dongarra, J., and D. W. Walker, “The Quest for Petascale Computing,” Computing in Science and Engineering, vol. 3, no. 3, pp. 32-39, May 2001.

(178.3 KB)

Calland, P-Y., J. Dongarra, and Y. Robert, “Tiling on Systems with Communication/Computation Overlap,” Concurrency: Practice and Experience, vol. 11, no. 3, pp. 139-153, January 1999.

(286.14 KB)

Anzt, H., E. Chow, and J. Dongarra, “On block-asynchronous execution on GPUs,” LAPACK Working Note, no. 291, November 2016.

(1.05 MB)

Buttari, A., J. Dongarra, P. Husbands, J. Kurzak, and K. Yelick, “Multithreading for synchronization tolerance in matrix factorization,” Journal of Physics: Conference Series, SciDAC 2007, vol. 78, no. 2007, January 2007.

(577.73 KB)

Supinski, B. R. de, J. K. Hollingsworth, S. Moore, and P. H. Worley, “Results of the PERI survey of SciDAC applications,” Journal of Physics: Conference Series, SciDAC 2007, vol. 78, no. 2007, January 2007.

(692.83 KB)

Dongarra, J., and et al., “Remembering Ken Kennedy,” SciDAC Review, vol. 5, no. 2007, 00 2007.

(519.68 KB)

Tisseur, F., and J. Dongarra, “Parallelizing the Divide and Conquer Algorithm for the Symmetric Tridiagonal Eigenvalue Problem on Distributed Memory Architectures,” SIAM Journal on Scientific Computing, vol. 6, no. 20, pp. 2223-2236, October 2002.

(321.36 KB)

Bosilca, G., Z. Chen, J. Dongarra, V. Eijkhout, G. Fagg, E. Fuentes, J. Langou, P. Luszczek, J. Pjesivac–Grbovic, K. Seymour, et al., “Self Adapting Numerical Software SANS Effort,” IBM Journal of Research and Development, vol. 50, no. 2/3, pp. 223-238, January 2006.

(357.53 KB)

Vadhiyar, S., and J. Dongarra, “Self Adaptivity in Grid Computing,” Concurrency and Computation: Practice and Experience, Special Issue: Grid Performance, vol. 17, no. 2-4, pp. 235-257, 00 2005.

(394.66 KB)

Arnold, D., S. Vadhiyar, and J. Dongarra, “On the Convergence of Computational and Data Grids,” Parallel Processing Letters, vol. 11, no. 2-3, pp. 187-202, January 2001.

(213.35 KB)

Gustavson, F. G., J. Wasniewski, J. Dongarra, and J. Langou, “Rectangular Full Packed Format for Cholesky’s Algorithm: Factorization, Solution, and Inversion,” ACM Transactions on Mathematical Software (TOMS), vol. 37, no. 2, Atlanta, GA, April 2010.

(896.03 KB)

Gustavson, F. G., J. Wasniewski, J. Dongarra, and J. Langou, “Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,” ACM Transactions on Mathematical Software (TOMS), vol. 37, no. 2, April 2010.

(896.03 KB)

Youseff, L., K. Seymour, H. You, D. Zagorodnov, J. Dongarra, and R. Wolski, “Paravirtualization Effect on Single- and Multi-threaded Memory-Intensive Linear Algebra Software,” Cluster Computing Journal: Special Issue on High Performance Distributed Computing, vol. 12, no. 2: Springer Netherlands, pp. 101-122, 00 2009.

(451.07 KB)

Dongarra, J., G. H. Golub, E. Grosse, C. Moler, and K. Moore, “Netlib and NA-Net: Building a Scientific Computing Community,” IEEE Annals of the History of Computing, vol. 30, no. 2, pp. 30-41, January 2008.

(352.71 KB)

Hardt, M., K. Seymour, J. Dongarra, M. Zapf, and N. Ruiter, “Interactive Grid-Access Using Gridsolve and Giggle,” Computing and Informatics, vol. 27, no. 2, pp. 233-248,ISSN1335-9150, 00 2008.

(533.4 KB)

Pjesivac–Grbovic, J., T. Angskun, G. Bosilca, G. Fagg, E. Gabriel, and J. Dongarra, “Performance Analysis of MPI Collective Operations,” Cluster computing, vol. 10, no. 2: Springer Netherlands, pp. 127-143, June 2007.

(1018.28 KB)

Berman, F., H. Casanova, A. Chien, K. Cooper, H. Dail, A. Dasgupta, W. Deng, J. Dongarra, L. Johnsson, K. Kennedy, et al., “New Grid Scheduling and Rescheduling Methods in the GrADS Project,” International Journal of Parallel Programming, vol. 33, no. 2: Springer, pp. 209-229, June 2005.

(306.41 KB)

Eijkhout, V., E. Fuentes, T. Eidson, and J. Dongarra, “The Component Structure of a Self-Adapting Numerical Software System,” International Journal of Parallel Programming, vol. 33, no. 2, June 2005.

(64.88 KB)

Vadhiyar, S., and J. Dongarra, “SRS - A Framework for Developing Malleable and Migratable Parallel Software,” Parallel Processing Letters, vol. 13, no. 2, pp. 291-312, June 2003.

(211.6 KB)

Dongarra, J., and V. Eijkhout, “Self Adapting Numerical Algorithm for Next Generation Applications,” International Journal of High Performance Computing Applications, vol. 17, no. 2, pp. 125-132, January 2003.

(479.18 KB)

Blackford, S., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, et al., “An Updated Set of Basic Linear Algebra Subprograms (BLAS),” ACM Transactions on Mathematical Software, vol. 28, no. 2, pp. 135-151, December 2002. DOI: 10.1145/567806.567807

(228.33 KB)

Henry, G., D. Watkins, and J. Dongarra, “A Parallel Implementation of the Nonsymmetric QR Algorithm for Disitributed Memory Architectures,” SIAM Journal on Scientific Computing, vol. 16, no. 2, pp. 284-311, October 2002.

(224.7 KB)

Browne, S., J. Dongarra, and A. Trefethen, “Numerical Libraries and Tools for Scalable Parallel Cluster Computing,” International Journal of High Performance Applications and Supercomputing, vol. 15, no. 2, pp. 175-180, October 2002.

(37.38 KB)

Doolin, D., J. Dongarra, and K. Seymour, “JLAPACK - Compiling LAPACK Fortran to Java,” Scientific Programming, vol. 7, no. 2, pp. 111-138, October 2002.

(307.46 KB)

Dongarra, J., S. Moore, and A. Trefethen, “Numerical Libraries and Tools for Scalable Parallel Cluster Computing,” International Journal of High Performance Applications and Supercomputing, vol. 15, no. 2, pp. 175-180, January 2001.

(37.38 KB)

Dongarra, J., “Measuring Computer Performance: A Practioner's Guide,” SIAM Review (book review), vol. 43, no. 2, pp. 383-384, 00 2001.

(558.9 KB)

Fischer, M., and J. Dongarra, “Experiences with Windows 95/NT as a Cluster Computing Platform for Parallel Computing,” Parallel and Distributed Computing Practices, Special Issue: Cluster Computing, vol. 2, no. 2: Nova Science Publishers, USA, pp. 119-128, February 1999.

(164.04 KB)

Boulet, P., J. Dongarra, F. Rastello, Y. Robert, and F. Vivien, “Algorithmic Issues on Heterogeneous Computing Platforms,” Parallel Processing Letters, vol. 9, no. 2, pp. 197-213, January 1999.

(301.17 KB)

Ltaeif, H., P. Luszczek, and J. Dongarra, “High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures,” ACM Transactions on Mathematical Software (TOMS), vol. 39, issue 3, no. 16, 2013. DOI: 10.1145/2450153.2450154

(665.7 KB)

Voemel, C., S. Tomov, O. Marques, A. Canning, L-W. Wang, and J. Dongarra, “State-of-the-Art Eigensolvers for Electronic Structure Calculations of Large Scale Nano-Systems,” Journal of Computational Physics, vol. 227, no. 15, pp. 7113-7124, January 2008.

D'Azevedo, E., and J. Dongarra, “The Design and Implementation of the Parallel Out of Core ScaLAPACK LU, QR, and Cholesky Factorization Routines,” Concurrency: Practice and Experience, vol. 12, no. 15, pp. 1481-1493, January 2000.

(374.18 KB)

Seymour, K., A. YarKhan, S. Agrawal, and J. Dongarra, “NetSolve: Grid Enabling Scientific Computing Environments,” Grid Computing and New Frontiers of High Performance Processing, no. 14: Elsevier, 00 2005.

(425 KB)

Hiroyasu, T., M. Miki, J. Sawada, and J. Dongarra, “Optimization of Injection Schedule of Diesel Engine Using GridRPC,” Information Processing Society of Japan Symposium Series, vol. 2003, no. 14, pp. 189-197, January 2003.

(520.96 KB)

Hiroyasu, T., M. Miki, H. Saito, Y. Tanimura, and J. Dongarra, “Static Scheduling for ScaLAPACK on the Grid Using Genetic Algorithm,” Information Processing Society of Japan Symposium Series, vol. 2003, no. 14, pp. 3-10, January 2003.

(506.42 KB)

Dongarra, J., “High Performance Computing Trends, Supercomputers, Clusters, and Grids,” Information Processing Society of Japan Symposium Series, vol. 2003, no. 14, pp. 55-58, January 2003.

Moore, K., and J. Dongarra, “NetBuild: Transparent Cross-Platform Access to Computational Software Libraries,” Concurrency and Computation: Practice and Experience, Special Issue: Grid Computing Environments, vol. 14, no. 13-15, pp. 1445-1456, November 2002.

(74.84 KB)

Arnold, D., H. Casanova, and J. Dongarra, “Innovations of the NetSolve Grid Computing System,” Concurrency: Practice and Experience, vol. 14, no. 13-15, pp. 1457-1479, January 2002.

(311.31 KB)

Strohmaier, E., J. Dongarra, H. Meuer, and H. D. Simon, “The Marketplace for High-Performance Computers,” Parallel Computing, vol. 25, no. 13-14, pp. 1517-1545, October 2002.

(285.78 KB)

Chen, Z., and J. Dongarra, “Algorithm-Based Fault Tolerance for Fail-Stop Failures,” IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 12, January 2008.

(340.49 KB)

Petitet, A., and J. Dongarra, “Algorithmic Redistribution Methods for Block Cyclic Decompositions,” IEEE Transactions on Parallel and Distributed Computing, vol. 10, no. 12, pp. 201-220, October 2002.

(524.82 KB)

Beck, M., D. Arnold, A. Bassi, F. Berman, H. Casanova, J. Dongarra, T. Moore, G. Obertelli, J. Plank, M. Swany, et al., “Middleware for the Use of Storage in Communication,” Parallel Computing, vol. 28, no. 12, pp. 1773-1788, August 2002.

(87.97 KB)

Kennedy, K., B. Broom, K. Cooper, J. Dongarra, R. Fowler, D. Gannon, L. Johnsson, J. Mellor-Crummey, and L. Torczon, “Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries,” Journal of Parallel and Distributed Computing, vol. 61, no. 12, pp. 1803-1826, December 2001.

(386.37 KB)

Chen, Z., J. Dongarra, P. Luszczek, and K. Roche, “Self Adapting Software for Numerical Linear Algebra and LAPACK for Clusters,” Parallel Computing, vol. 29, no. 11-12, pp. 1723-1743, November 2003.

(343.44 KB)

Wolf, F., B. Mohr, J. Dongarra, and S. Moore, “Automatic Analysis of Inefficiency Patterns in Parallel Applications,” Concurrency and Computation: Practice and Experience, vol. 19, no. 11, pp. 1481-1496, August 2007.

(233.31 KB)

Gerndt, M., and K. Fürlinger, “Specification and detection of performance problems with ASL,” Concurrency and Computation: Practice and Experience, vol. 19, no. 11: John Wiley and Sons Ltd., pp. 1451-1464, January 2007.

Fagg, G., A. Bukovsky, and J. Dongarra, “HARNESS and Fault Tolerant MPI,” Parallel Computing, vol. 27, no. 11, pp. 1479-1496, January 2001.

(164.2 KB)

Beck, M., H. Casanova, J. Dongarra, T. Moore, J. Plank, F. Berman, and R. Wolski, “Logistical Quality of Service in NetSolve,” Computer Communications, vol. 22, no. 11, pp. 1034-1044, January 1999.

(168.39 KB)

Kurzak, J., H. Anzt, M. Gates, and J. Dongarra, “Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs,” IEEE Transactions on Parallel and Distributed Systems, no. 1045-9219, November 2015.

Bouteiller, A., T. Herault, G. Bosilca, P. Du, and J. Dongarra, “Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures, and Accuracy,” ACM Transactions on Parallel Computing, vol. 1, issue 2, no. 10, pp. 10:1-10:28, January 2015. DOI: 10.1145/2686892

(1.14 MB)

Kurzak, J., and J. Dongarra, “Implementation of Mixed Precision in Solving Systems of Linear Equations on the Cell Processor,” Concurrency and Computation: Practice and Experience, vol. 19, no. 10, pp. 1371-1385, July 2007.

(453.78 KB)

Dongarra, J., “Network-Enabled Solvers: A Step Toward Grid-Based Computing,” SIAM News, vol. 34, no. 10, December 2001.

“Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard,” International Journal of High Performance Computing Applications: Special Issue - Part I & II, vol. 16, no. 1-2, pp. 1-199, January 2002.

Whaley, C., A. Petitet, and J. Dongarra, “Automated Empirical Optimization of Software and the ATLAS Project,” Parallel Computing, vol. 27, no. 1-2, pp. 3-25, January 2001.

(370.71 KB)

Dongarra, J., and V. Eijkhout, “Numerical Linear Algebra Algorithms and Software,” Journal of Computational and Applied Mathematics, vol. 123, no. 1-2, pp. 489-514, October 1999.

(258.62 KB)

Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, “Composing Resilience Techniques: ABFT, Periodic, and Incremental Checkpointing,” International Journal of Networking and Computing, vol. 5, no. 1, pp. 2-15, January 2015.

(755.54 KB)

Dongarra, J., T. Herault, and Y. Robert, “Performance and Reliability Trade-offs for the Double Checkpointing Algorithm,” International Journal of Networking and Computing, vol. 4, no. 1, pp. 32-41.

(859.04 KB)

Langou, J., and J. Dongarra, “The Problem with the Linpack Benchmark Matrix Generator,” International Journal of High Performance Computing Applications, vol. 23, no. 1, pp. 5-14, 00 2009.

(136.41 KB)

Jeannot, E., K. Seymour, A. YarKhan, and J. Dongarra, “Improved Runtime and Transfer Time Prediction Mechanisms in a Network Enabled Servers Middleware,” Parallel Processing Letters, vol. 17, no. 1, pp. 47-59, March 2007.

(718.4 KB)

Jeannot, E., K. Seymour, A. YarKhan, and J. Dongarra, “Improved Runtime and Transfer Time Prediction Mechanisms in a Network Enabled Server,” Parallel Processing Letters, vol. 17, no. 1, pp. 47-59, March 2006.

(718.4 KB)

YarKhan, A., K. Seymour, K. Sagi, Z. Shi, and J. Dongarra, “Recent Developments in GridSolve,” International Journal of High Performance Computing Applications (Special Issue: Scheduling for Large-Scale Heterogeneous Platforms), vol. 20, no. 1: Sage Science Press, 00 2006.

(496.69 KB)

Giraud, L., J. Langou, M. Rozložník, and J. van den Eshof, “Rounding Error Analysis of the Classical Gram-Schmidt Orthogonalization Process,” Numerische Mathematik, vol. 101, no. 1, pp. 87-100, January 2005.

(157.48 KB)

Casanova, H., T. Bartol, F. Berman, A. Birnbaum, J. Dongarra, M. Ellisman, M. Faerman, E. Gockay, M. Miller, G. Obertelli, et al., “The Virtual Instrument: Support for Grid-enabled Scientific Simulations,” International Journal of High Performance Computing Applications, vol. 18, no. 1, pp. 3-17, January 2004.

(282.16 KB)

Vadhiyar, S., G. Fagg, and J. Dongarra, “Towards an Accurate Model for Collective Communications,” International Journal of High Performance Applications, Special Issue: Automatic Performance Tuning, vol. 18, no. 1, pp. 159-167, January 2004.

(250.73 KB)

Henry, G., D. Watkins, and J. Dongarra, “A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures,” SIAM Journal on Scientific Computing, vol. 24, no. 1, pp. 284-311, January 2003.

(224.7 KB)

Casanova, H., M. G. Thomason, and J. Dongarra, “Stochastic Performance Prediction for Iterative Algorithms in Distributed Environments,” Journal of Parallel and Distributed Computing, vol. 98, no. 1, pp. 68-91, October 2002.

(266.82 KB)

Dongarra, J., V. Eijkhout, and P. Luszczek, “Recursive Approach in Sparse Matrix LU Factorization,” Scientific Programming, vol. 9, no. 1, pp. 51-60, 00 2001.

(217.16 KB)

Casanova, H., M H. Kim, J. Plank, and J. Dongarra, “Stochastic Performance Prediction for Iterative Algorithms in Distributed Environments,” Journal of Parallel and Distributed Computing, vol. 98, no. 1, pp. 68-91, January 1999.

(257.5 KB)

Aupy, G., A. Benoit, H. Casanova, and Y. Robert, “Scheduling Computational Workflows on Failure-prone Platforms,” International Journal of Networking and Computing, vol. 6, no. 1, pp. 2-26, 2016.

(503.81 KB)

Benoit, A., A. Cavelan, V. Le Fèvre, Y. Robert, and H. Sun, “Towards Optimal Multi-Level Checkpointing,” IEEE Transactions on Computers, vol. 66, issue 7, pp. 1212–1226, July 2017. DOI: 10.1109/TC.2016.2643660

(1.39 MB)

Benoit, A., L. Pottier, and Y. Robert, “Resilient Co-Scheduling of Malleable Applications,” International Journal of High Performance Computing Applications (IJHPCA), May 2017. DOI: 10.1177/1094342017704979

(1.62 MB)

Baboulin, M., J. Dongarra, A. Remy, S. Tomov, and I. Yamazaki, “Solving Dense Symmetric Indefinite Systems using GPUs,” Concurrency and Computation: Practice and Experience, vol. 29, issue 9, March 2017. DOI: 10.1002/cpe.4055

(1.94 MB)

Yamazaki, I., S. Tomov, and J. Dongarra, “Non-GPU-resident Dense Symmetric Indefinite Factorization,” Concurrency and Computation: Practice and Experience, November 2016. DOI: 10.1002/cpe.4012

Yamazaki, I., S. Tomov, and J. Dongarra, “Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU,” ACM Transactions on Mathematical Software (TOMS), vol. 43, issue 2, October 2016.

Benoit, A., A. Cavelan, Y. Robert, and H. Sun, “Assessing General-purpose Algorithms to Cope with Fail-stop and Silent Errors,” ACM Transactions on Parallel Computing, August 2016. DOI: 10.1145/2897189

(573.71 KB)

Dongarra, J., The HPL Benchmark: Past, Present & Future , ISC High Performance, Frankfurt, Germany, July 2016.

(3.41 MB)

YarKhan, A., J. Kurzak, P. Luszczek, and J. Dongarra, “Porting the PLASMA Numerical Library to the OpenMP Standard,” International Journal of Parallel Programming, June 2016. DOI: 10.1007/s10766-016-0441-6

(1.66 MB)

Abdelfattah, A., H. Ltaeif, D. Keyes, and J. Dongarra, “Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs,” Concurrency and Computation: Practice and Experience, vol. 28, issue 12, pp. 3447 - 3465, May 2016. DOI: 10.1002/cpe.v28.1210.1002/cpe.3874

(3.21 MB)

Dongarra, J., M. A. Heroux, and P. Luszczek, “High Performance Conjugate Gradient Benchmark: A new Metric for Ranking High Performance Computing Systems,” International Journal of High Performance Computing Applications, vol. 30, issue 1, pp. 3 - 10, February 2016. DOI: 10.1177/1094342015593158

(277.51 KB)

Herrmann, J., G. Bosilca, T. Herault, L. Marchal, Y. Robert, and J. Dongarra, “Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results,” Parallel Computing, vol. 52, pp. 22-41, February 2016. DOI: doi:10.1016/j.parco.2015.09.005

(2.06 MB)

Anzt, H., B. Haugen, J. Kurzak, P. Luszczek, and J. Dongarra, “Experiences in autotuning matrix multiplication for energy minimization on GPUs,” Concurrency in Computation: Practice and Experience, vol. 27, issue 17, pp. 5096-5113, December 2015. DOI: 10.1002/cpe.3516

(1.98 MB)

Strohmaier, E., H. Meuer, J. Dongarra, and H. D. Simon, “The TOP500 List and Progress in High-Performance Computing,” IEEE Computer, vol. 48, issue 11, pp. 42-49, November 2015. DOI: doi:10.1109/MC.2015.338

Faverge, M., J. Herrmann, J. Langou, B. Lowery, Y. Robert, and J. Dongarra, “Mixing LU-QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers,” Journal of Parallel and Distributed Computing, vol. 85, pp. 32-46, November 2015. DOI: doi:10.1016/j.jpdc.2015.06.007

(5.06 MB)

Baboulin, M., J. Dongarra, A. Remy, S. Tomov, and I. Yamazaki, “Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures,” Lecture Notes in Computer Science, vol. 9573: Springer International Publishing, pp. 86-95, September 2015, 2016. DOI: 10.1007/978-3-319-32149-3_9

(327.14 KB)

Song, F., and J. Dongarra, “A Scalable Approach to Solving Dense Linear Algebra Problems on Hybrid CPU-GPU Systems,” Concurrency and Computation: Practice and Experience, vol. 27, issue 14, pp. 3702-3723, September 2015. DOI: 10.1002/cpe.3403

(8.16 MB)

Benoit, A., S. K. Raina, and Y. Robert, “Efficient Checkpoint/Verification Patterns,” International Journal on High Performance Computing Applications, July 2015. DOI: 10.1177/1094342015594531

(392.76 KB)

Haidar, A., T. Dong, P. Luszczek, S. Tomov, and J. Dongarra, “Batched matrix computations on hardware accelerators based on GPUs,” International Journal of High Performance Computing Applications, February 2015. DOI: 10.1177/1094342014567546

(2.16 MB)

Haidar, A., J. Dongarra, K. Kabir, M. Gates, P. Luszczek, S. Tomov, and Y. Jia, “HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi,” Scientific Programming, vol. 23, issue 1, January 2015. DOI: 10.3233/SPR-140404

(553.94 KB)

Aliaga, J. I., H. Anzt, M. Castillo, J. C. Fernández, G. León, J. Pérez, and E. S. Quintana-Orti, “Unveiling the Performance-energy Trade-off in Iterative Linear System Solvers for Multithreaded Processors,” Concurrency and Computation: Practice and Experience, vol. 27, issue 4, pp. 885-904, September 2014. DOI: 10.1002/cpe.3341

(1.83 MB)

Luszczek, P., J. Kurzak, and J. Dongarra, “Looking Back at Dense Linear Algebra Software,” Journal of Parallel and Distributed Computing, vol. 74, issue 7, pp. 2548–2560, July 2014. DOI: 10.1016/j.jpdc.2013.10.005

(1.79 MB)

Anzt, H., and E. S. Quintana-Orti, “Improving the Energy Efficiency of Sparse Linear System Solvers on Multicore and Manycore Systems,” Philosophical Transactions of the Royal Society A -- Mathematical, Physical and Engineering Sciences, vol. 372, issue 2018, July 2014. DOI: 10.1098/rsta.2013.0279

(779.57 KB)

Haidar, A., R. Solcà, M. Gates, S. Tomov, T. C. Schulthess, and J. Dongarra, “A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks,” International Journal of High Performance Computing Applications, vol. 28, issue 2, pp. 196-209, May 2014. DOI: 10.1177/1094342013502097

(1.74 MB)

Nelson, J., “Analyzing PAPI Performance on Virtual Machines,” VMWare Technical Journal, vol. Winter 2013, January 2014.

Bland, W., A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. Dongarra, “An evaluation of User-Level Failure Mitigation support in MPI,” Computing, vol. 95, issue 12, pp. 1171-1184, December 2013. DOI: 10.1007/s00607-013-0331-3

(311.23 KB)

Bosilca, G., A. Bouteiller, E. Brunet, F. Cappello, J. Dongarra, A. Guermouche, T. Herault, Y. Robert, F. Vivien, and D. Zaidouni, “Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,” Concurrency and Computation: Practice and Experience, November 2013. DOI: 10.1002/cpe.3173

(894.61 KB)

Du, P., P. Luszczek, S. Tomov, and J. Dongarra, “Soft Error Resilient QR Factorization for Hybrid System with GPGPU,” Journal of Computational Science, vol. 4, issue 6, pp. 457–464, November 2013. DOI: http://dx.doi.org/10.1016/j.jocs.2013.01.004

(995.45 KB)

Yamazaki, I., T. Dong, R. Solcà, S. Tomov, J. Dongarra, and T. C. Schulthess, “Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems,” Concurrency and Computation: Practice and Experience, October 2013.

(1.71 MB)

Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, “Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI,” Concurrency and Computation: Practice and Experience, July 2013. DOI: 10.1002/cpe.3100

(3.89 MB)

Aupy, G., A. Benoit, T. Herault, Y. Robert, F. Vivien, and D. Zaidouni, “On the Combination of Silent Error Detection and Checkpointing,” UT-CS-13-710: University of Tennessee Computer Science Technical Report, June 2013.

(1.29 MB)

Heroux, M. A., and J. Dongarra, “Toward a New Metric for Ranking High Performance Computing Systems,” SAND2013 - 4744, June 2013.

(225.32 KB)

Li, Y., A. YarKhan, J. Dongarra, K. Seymour, and A. Hurault, “Enabling Workflows in GridSolve: Request Sequencing and Service Trading,” Journal of Supercomputing, vol. 64, issue 3, pp. 1133-1152, June 2013. DOI: 10.1007/s11227-010-0549-1

(821.29 KB)

Bouteiller, A., T. Herault, G. Bosilca, and J. Dongarra, “Correlated Set Coordination in Fault Tolerant Message Logging Protocols,” Concurrency and Computation: Practice and Experience, vol. 25, issue 4, pp. 572-585, March 2013. DOI: 10.1002/cpe.2859

(636.68 KB)

Danalis, A., P. Luszczek, G. Marin, J. Vetter, and J. Dongarra, “BlackjackBench: Portable Hardware Characterization with Automated Results Analysis,” The Computer Journal, March 2013. DOI: 10.1093/comjnl/bxt057

(408.45 KB)

Gustavson, F. G., J. Wasniewski, J. Dongarra, J. Herrero, and J. Langou, “Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms,” ACM Transactions on Mathematical Software (TOMS), vol. 39, issue 2, February 2013. DOI: 10.1145/2427023.2427026

(439.46 KB)

Bland, W., A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, “Post-failure recovery of MPI communication capability: Design and rationale,” International Journal of High Performance Computing Applications, vol. 27, issue 3, pp. 244 - 254, January 2013. DOI: 10.1177/1094342013488238

(285.77 KB)

Anzt, H., P. Luszczek, J. Dongarra, and V. Heuveline, “GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement,” University of Tennessee Computer Science Technical Report UT-CS-11-690 (also Lawn 260), December 2011.

(662.98 KB)

Haidar, A., H. Ltaeif, and J. Dongarra, “Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels,” University of Tennessee Computer Science Technical Report, UT-CS-11-677, (also Lawn254), August 2011.

(636.01 KB)

Ma, T., G. Bosilca, A. Bouteiller, B. Goglin, J. Squyres, and J. Dongarra, “Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs,” University of Tennessee Computer Science Technical Report, UT-CS-10-663, November 2010.

(384.75 KB)

Dongarra, J., and P. Beckman, “International Exascale Software Project Roadmap v1.0,” University of Tennessee Computer Science Technical Report, UT-CS-10-654, May 2010.

(719.74 KB)

(6.42 MB)

Kurzak, J., H. Ltaeif, J. Dongarra, and R. M. Badia, “Scheduling Linear Algebra Operations on Multicore Processors,” University of Tennessee Computer Science Department Technical Report, UT-CS-09-636 (Also LAPACK Working Note 213), 00 2009.

(716.18 KB)

Kurzak, J., and J. Dongarra, “Fully Dynamic Scheduler for Numerical Computing on Multicore Processors,” University of Tennessee Computer Science Department Technical Report, UT-CS-09-643 (Also LAPACK Working Note 220), 00 2009.

(488.24 KB)

Dongarra, J., and J. Langou, “The Problem with the Linpack Benchmark Matrix Generator,” University of Tennessee Computer Science Technical Report, UT-CS-08-621 (also LAPACK Working Note 206), June 2008.

(136.41 KB)

Gustavson, F. G., J. Wasniewski, and J. Dongarra, “Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,” University of Tennessee Computer Science Technical Report, UT-CS-08-614 (also LAPACK Working Note 199), April 2008.

(896.03 KB)

Bosilca, G., R. Delmas, J. Dongarra, and J. Langou, “Algorithmic Based Fault Tolerance Applied to High Performance Computing,” University of Tennessee Computer Science Technical Report, UT-CS-08-620 (also LAPACK Working Note 205), January 2008.

(313.55 KB)

Baboulin, M., and S. Gratton, “Using dual techniques to derive componentwise and mixed condition numbers for a linear functional of a linear least squares solution,” University of Tennessee Computer Science Technical Report, UT-CS-08-622 (also LAPACK Working Note 207), January 2008.

(159.65 KB)

Dongarra, J., J. Demmel, P. Husbands, and P. Luszczek, “HPCS Library Study Effort,” University of Tennessee Computer Science Technical Report, UT-CS-08-617, January 2008.

(73.22 KB)

Song, F., S. Moore, and J. Dongarra, “Analytical Modeling for Affinity-Based Thread Scheduling on Multicore Platforms,” University of Tennessee Computer Science Technical Report, UT-CS-08-626, January 2008.

(650.75 KB)

(6.42 MB)

Eijkhout, V., “Numerical Metadata API Reference,” Innovative Computing Laboratory Technical Report, February 2007.

(454.79 KB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),” University of Tennessee Computer Science Dept. Technical Report CS-89-85, 00 2007.

(6.42 MB)

Buttari, A., P. Luszczek, J. Kurzak, J. Dongarra, and G. Bosilca, “SCOP3: A Rough Guide to Scientific Computing On the PlayStation 3,” University of Tennessee Computer Science Dept. Technical Report, UT-CS-07-595, 00 2007.

(1.74 MB)

Buttari, A., J. Dongarra, and J. Kurzak, “Limitations of the Playstation 3 for High Performance Cluster Computing,” University of Tennessee Computer Science Technical Report, UT-CS-07-597 (Also LAPACK Working Note 185), 00 2007.

(171.01 KB)

(6.42 MB)

Dongarra, J., G. H. Golub, E. Grosse, C. Moler, and K. Moore, “Twenty-Plus Years of Netlib and NA-Net,” University of Tennessee Computer Science Department Technical Report, UT-CS-04-526, 00 2006.

(62.79 KB)

Chen, Z., and J. Dongarra, “Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources,” University of Tennessee Computer Science Department Technical Report, vol. –05-561, November 2005.

(266.54 KB)

Bosilca, G., Z. Chen, J. Dongarra, and J. Langou, “Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,” University of Tennessee Computer Science Department Technical Report, UT-CS-04-538, 00 2005.

(241.36 KB)

Chen, Z., and J. Dongarra, “Condition Numbers of Gaussian Random Matrices,” University of Tennessee Computer Science Department Technical Report, vol. –04-539, 00 2005.

(186.46 KB)

Chen, Z., and J. Dongarra, “Numerically Stable Real-Number Codes Based on Random Matrices,” University of Tennessee Computer Science Department Technical Report, vol. –04-526, October 2004.

(91.66 KB)

Emad, N., S. A. S. Fazeli, and J. Dongarra, “An Asynchronous Algorithm on NetSolve Global Computing System,” PRiSM - Laboratoire de recherche en informatique, Université de Versailles St-Quentin Technical Report, March 2004.

(377.33 KB)

(6.42 MB)

Dongarra, J., and V. Eijkhout, “Finite-choice Algorithm Optimization in Conjugate Gradients (LAPACK Working Note 159),” University of Tennessee Computer Science Technical Report, UT-CS-03-502, January 2003.

(64.52 KB)

Chen, Z., J. Dongarra, P. Luszczek, and K. Roche, “Self Adapting Software for Numerical Linear Algebra and LAPACK for Clusters (LAPACK Working Note 160),” University of Tennessee Computer Science Technical Report, UT-CS-03-499, January 2003.

(343.44 KB)

Arnold, D., S. Browne, J. Dongarra, G. Fagg, and K. Moore, “Secure Remote Access to Numerical Software and Computation Hardware,” University of Tennessee Computer Science Technical Report, UT-CS-00-446, July 2000.

(402.31 KB)

Browne, S., J. Dongarra, N. Garner, K. London, and P. Mucci, “A Portable Programming Interface for Performance Evaluation on Modern Processors,” University of Tennessee Computer Science Technical Report, UT-CS-00-444, July 2000.

(655.17 KB)

Berman, F., A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, D. Reed, et al., “The GrADS Project: Software Support for High-Level Grid Application Development,” Technical Report, February 2000.

(347.41 KB)

Anzt, H., B. Haugen, J. Kurzak, P. Luszczek, and J. Dongarra, “Experiences in Autotuning Matrix Multiplication for Energy Minimization on GPUs,” Concurrency and Computation: Practice and Experience, vol. 27, issue 17, pp. 5096 - 5113, Oct 12, 2015. DOI: 10.1002/cpe.3516

(1.99 MB)

Dongarra, J., M. A. Heroux, and P. Luszczek, “High-Performance Conjugate-Gradient Benchmark: A New Metric for Ranking High-Performance Computing Systems,” The International Journal of High Performance Computing Applications, 2015. DOI: 10.1177/1094342015593158

(336.19 KB)

YarKhan, A., Dynamic Task Execution on Shared and Distributed Memory Architectures , 2012.

(3.29 MB)

Dongarra, J., A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and A. YarKhan, “Model-Driven One-Sided Factorizations on Multicore, Accelerated Systems,” Supercomputing Frontiers and Innovations, vol. 1, issue 1, 2014. DOI: http://dx.doi.org/10.14529/jsfi1401

(1.86 MB)

Anzt, H., M. Baboulin, J. Dongarra, Y. Fournier, F. Hulsemann, A. Khabou, and Y. Wang, “Accelerating the Conjugate Gradient Algorithm with GPU in CFD Simulations,” VECPAR, 2016.

Anzt, H., W. Sawyer, S. Tomov, P. Luszczek, and J. Dongarra, “Acceleration of GPU-based Krylov solvers via Data Transfer Reduction,” International Journal of High Performance Computing Applications, 2015.

Yamazaki, I., S. Tomov, and J. Dongarra, “Computing Low-rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and its Application to Solving a Hierarchically Semiseparable Linear System of Equations,” Scientific Programming, 2015.

(648.87 KB)

Tomov, S., and J. Dongarra, “Dense Linear Algebra for Hybrid GPU-based Systems,” Scientific Computing with Multicore and Accelerators, Boca Raton, Florida, CRC Press, 2010.

Hoefler, T., J. M. Squyres, G. Fagg, G. Bosilca, W. Rehm, and A. Lumsdaine, “A New Approach to MPI Collective Communication Implementations,” Distributed and Parallel Systems: Springer US, pp. 45-54, 2007. DOI: 10.1007/978-0-387-69858-8_5

(140.2 KB)

Dongarra, J., N. J. Higham, M. R. Dennis, P. Glendinning, P. A. Martin, F. Santosa, and J. Tanner, “High-Performance Computing,” The Princeton Companion to Applied Mathematics, Princeton, New Jersey, Princeton University Press, pp. 839-842, 2015.

Dongarra, J., and P. Luszczek, “HPC Challenge: Design, History, and Implementation Highlights,” Contemporary High Performance Computing: From Petascale Toward Exascale, Boca Raton, FL, Taylor and Francis, 2013.

(790.01 KB)

Vetter, J., R. Glassbrook, K. Schwan, S. Yalamanchili, M. Horton, A. Gavrilovska, M. Slawinska, J. Dongarra, J. Meredith, P. Roth, et al., “Keeneland: Computational Science Using Heterogeneous GPU Computing,” Contemporary High Performance Computing: From Petascale Toward Exascale, Boca Raton, FL, Taylor and Francis, 2013.

(2.7 MB)

Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra, “Scalable Dense Linear Algebra on Heterogeneous Hardware,” HPC: Transition Towards Exascale Processing, in the series Advances in Parallel Computing, 2013.

(760.32 KB)

Nath, R., S. Tomov, and J. Dongarra, “Blas for GPUs,” Scientific Computing with Multicore and Accelerators, Boca Raton, Florida, CRC Press, 2010.

(1.05 MB)

Bai, Z., J. Demmel, J. Dongarra, J. Langou, and J. Wang, “LAPACK,” Handbook of Linear Algebra, Second, Boca Raton, FL, CRC Press, 2013.

(223.21 KB)

Dongarra, J., M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and I. Yamazaki, “Accelerating Numerical Dense Linear Algebra Calculations with GPUs,” Numerical Computations with GPUs: Springer International Publishing, pp. 3-28, 2014. DOI: 10.1007/978-3-319-06548-9_1

(1.06 MB)

Zhao, Y., L. Wan, W. Wu, G. Bosilca, R. Vuduc, J. Ye, W. Tang, and Z. Xu, “Efficient Communications in Training Large Scale Neural Networks,” ACM MultiMedia Workshop 2017, Mountain View, CA, ACM, October 2017.

(1.41 MB)

Fang, A., A. Cavelan, Y. Robert, and A. Chien, “Resilience for Stencil Computations with Latent Errors,” International Conference on Parallel Processing (ICPP), Bristol, UK, IEEE Computer Society Press, August 2017.

(1.19 MB)

Yamazaki, I., M. Hoemmen, P. Luszczek, and J. Dongarra, Comparing performance of s-step and pipelined GMRES on distributed-memory multicore CPUs , Pittsburgh, Pennsylvania, SIAM Annual Meeting, July 2017.

(748 KB)

Benoit, A., F. Cappello, A. Cavelan, Y. Robert, and H. Sun, “Identifying the Right Replication Level to Detect and Correct Silent Errors at Scale,” 2017 Workshop on Fault-Tolerance for HPC at Extreme Scale, Washington, DC, ACM, June 2017. DOI: 10.1145/3086157.3086162

(865.68 KB)

Benoit, A., A. Cavelan, V. Le Fèvre, and Y. Robert, “Optimal Checkpointing Period with replicated execution on heterogeneous platforms,” 2017 Workshop on Fault-Tolerance for HPC at Extreme Scale, Washington, DC, IEEE Computer Society Press, June 2017. DOI: 10.1145/3086157.3086165

(1.02 MB)

Faverge, M., J. Langou, Y. Robert, and J. Dongarra, “Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, IEEE, May 2017. DOI: 10.1109/IPDPS.2017.46

(328.15 KB)

Aupy, G., A. Benoit, L. Pottier, P. Raghavan, Y. Robert, and M. Shantharam, “Co-Scheduling Algorithms for Cache-Partitioned Systems,” 19th Workshop on Advances in Parallel and Distributed Computational Models, Orlando, FL, IEEE Computer Society Press, May 2017. DOI: 10.1109/IPDPSW.2017.60

(584.76 KB)

Anzt, H., J. Dongarra, G. Flegar, and E. S. Quintana-Orti, “Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs,” Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, New York, NY, USA, ACM, pp. 1–10, February 2017. DOI: 10.1145/3026937.3026940

(552.62 KB)

Anzt, H., E. Chow, T. Huckle, and J. Dongarra, “Batched Generation of Incomplete Sparse Approximate Inverses on GPUs,” Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp. 49–56, November 2016. DOI: 10.1109/ScalA.2016.11

Lopez, M. G., V. Larrea, W. Joubert, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, “Towards Achieving Performance Portability Using Directives for Accelerators,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, Utah, Innovative Computing Laboratory, University of Tennessee, November 2016.

(567.02 KB)

Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, “Failure Detection and Propagation in HPC Systems,” Proceedings of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Salt Lake City, Utah, IEEE Press, pp. 27:1-27:11, November 2016.

Haidar, A., S. Tomov, K. Arturov, M. Guney, S. Story, and J. Dongarra, “LU, QR, and Cholesky Factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi,” IEEE High Performance Extreme Computing Conference (HPEC'16), Waltham, MA, IEEE, September 2016.

(943.23 KB)

Anzt, H., E. Chow, D. Szyld, and J. Dongarra, “Domain Overlap for Iterative Sparse Triangular Solves on GPUs,” Software for Exascale Computing - SPPEXA, vol. 113: Springer International Publishing, pp. 527–545, September 2016. DOI: 10.1007/978-3-319-40528-5_24

Haidar, A., B. Brock, S. Tomov, M. Guidry, J. Jay Billings, D. Shyles, and J. Dongarra, “Performance Analysis and Acceleration of Explicit Integration for Large Kinetic Networks using Batched GPU Computations,” 2016 IEEE High Performance Extreme Computing Conference (HPEC ‘16), Waltham, MA, IEEE, September 2016.

(480.29 KB)

Wu, W., G. Bosilca, R. vandeVaart, S. Jeaugey, and J. Dongarra, “GPU-Aware Non-contiguous Data Movement In Open MPI,” 25th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'16), Kyoto, Japan, ACM, June 2016. DOI: http://dx.doi.org/10.1145/2907294.2907317

(482.32 KB)

Luszczek, P., M. Gates, J. Kurzak, A. Danalis, and J. Dongarra, “Search Space Generation and Pruning System for Autotuners,” 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.

(555.44 KB)

Haugen, B., “Performance Analysis and Modeling of Task-Based Runtimes,” Department of Electrical Engineering and Computer Science, vol. PhD, Knoxville, University of Tennessee, May 2016.

(5.14 MB)

Anzt, H., J. Dongarra, M. Kreutzer, G. Wellein, and M. Kohler, “Efficiency of General Krylov Methods on GPUs – An Experimental Study,” 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 683-691, May 2016. DOI: 10.1109/IPDPSW.2016.45

Anzt, H., J. Dongarra, M. Kreutzer, G. Wellein, and M. Kohler, “Efficiency of General Krylov Methods on GPUs – An Experimental Study,” The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), Chicago, IL, IEEE, May 2016. DOI: 10.1109/IPDPSW.2016.45

(285.28 KB)

Jia, Y., P. Luszczek, and J. Dongarra, “Hessenberg Reduction with Transient Error Resilience on GPU-Based Hybrid Architectures,” 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.

(535.72 KB)

Benoit, A., A. Cavelan, Y. Robert, and H. Sun, “Optimal Resilience Patterns to Cope with Fail-stop and Silent Errors,” 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016. DOI: 10.1109/IPDPS.2016.39

(603.58 KB)

Anzt, H., E. Ponce, G. D. Peterson, and J. Dongarra, “GPU-accelerated Co-design of Induced Dimension Reduction: Algorithmic Fusion and Kernel Overlap,” 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing, Austin, TX, ACM, November 2015.

(1.46 MB)

Yamazaki, I., J. Kurzak, P. Luszczek, and J. Dongarra, “Randomized Algorithms to Update Partial Singular Value Decomposition on a Hybrid CPU/GPU Cluster,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.

Herault, T., A. Bouteiller, G. Bosilca, M. Gamell, K. Teranishi, M. Parashar, and J. Dongarra, “Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.

(550.96 KB)

Haidar, A., Y. Jia, P. Luszczek, S. Tomov, A. YarKhan, and J. Dongarra, “Weighted Dynamic Scheduling with Many Parallelism Grains for Offloading of Numerical Workloads to Multiple Varied Accelerators,” Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA'15), vol. No. 5, Austin, TX, ACM, November 2015.

(347.6 KB)

Solcà, R., A. Kozhevnikov, A. Haidar, S. Tomov, T. C. Schulthess, and J. Dongarra, “Efficient Implementation Of Quantum Materials Simulations On Distributed CPU-GPU Systems,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.

(1.09 MB)

Anzt, H., J. Dongarra, and E. S. Quintana-Orti, “Tuning Stationary Iterative Solvers for Fault Resilience,” 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA15), Austin, TX, ACM, November 2015.

(1.28 MB)

Anzt, H., J. Dongarra, and E. S. Quintana-Orti, “Adaptive Precision Solvers for Sparse Linear Systems,” 3rd International Workshop on Energy Efficient Supercomputing (E2SC '15), Austin, TX, ACM, November 2015.

Yamazaki, I., S. Tomov, J. Kurzak, J. Dongarra, and J. Barlow, “Mixed-precision Block Gram Schmidt Orthogonalization,” 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Austin, TX, ACM, November 2015.

(235.69 KB)

Mary, T., I. Yamazaki, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra, “Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.

Gates, M., H. Anzt, J. Kurzak, and J. Dongarra, “Accelerating Collaborative Filtering for Implicit Feedback Datasets using GPUs,” 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, IEEE, November 2015.

(1.02 MB)

Anzt, H., E. Chow, D. Szyld, and J. Dongarra, “Random-Order Alternating Schwarz for Sparse Triangular Solves,” 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.

(1.53 MB)

Yamazaki, I., J. Barlow, S. Tomov, J. Kurzak, and J. Dongarra, “Mixed-precision orthogonalization process Performance on multicore CPUs with GPUs,” 2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.

(301.01 KB)

Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, “Efficient Eigensolver Algorithms on Accelerator Based Architectures,” 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.

(6.98 MB)

Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, “Batched Matrix Computations on Hardware Accelerators Based on GPUs,” 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.

(9.36 MB)

Gates, M., S. Tomov, and A. Haidar, “Comparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra,” 2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.

(4.7 MB)

Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, “Batched Matrix Computations on Hardware Accelerators,” EuroMPI/Asia 2015 Workshop, Bordeaux, France, September 2015.

(589.05 KB)

Bouteiller, A., G. Bosilca, and J. Dongarra, “Plan B: Interruption of Ongoing MPI Operations to Support Failure Recovery,” 22nd European MPI Users' Group Meeting, Bordeaux, France, ACM, September 2015. DOI: 10.1145/2802658.2802668

(543.32 KB)

Haidar, A., S. Tomov, P. Luszczek, and J. Dongarra, “MAGMA Embedded: Towards a Dense Linear Algebra Library for Energy Efficient Extreme Computing,” 2015 IEEE High Performance Extreme Computing Conference (HPEC ’15), (Best Paper Award), Waltham, MA, IEEE, September 2015.

(678.86 KB)

Anzt, H., E. Chow, and J. Dongarra, “Iterative Sparse Triangular Solves for Preconditioning,” EuroPar 2015, Vienna, Austria, Springer Berlin, August 2015. DOI: 10.1007/978-3-662-48096-0_50

(322.36 KB)

YarKhan, A., A. Haidar, C. Cao, P. Luszczek, S. Tomov, and J. Dongarra, “Cholesky Across Accelerators,” 17th IEEE International Conference on High Performance Computing and Communications (HPCC 2015), Elizabeth, NJ, IEEE, August 2015.

Haidar, A., A. YarKhan, C. Cao, P. Luszczek, S. Tomov, and J. Dongarra, “Flexible Linear Algebra Development and Scheduling with Cholesky Factorization,” 17th IEEE International Conference on High Performance Computing and Communications, Newark, NJ, August 2015.

(494.31 KB)

Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, “On the Design, Development, and Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors,” ISC High Performance 2015, Frankfurt, Germany, July 2015.

(1.49 MB)

Haidar, A., T. Dong, S. Tomov, P. Luszczek, and J. Dongarra, “Framework for Batched and GPU-resident Factorization Algorithms to Block Householder Transformations,” ISC High Performance, Frankfurt, Germany, Springer, July 2015.

(778.26 KB)

Chow, E., H. Anzt, and J. Dongarra, “Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs,” International Supercomputing Conference (ISC 2015), Frankfurt, Germany, July 2015.

Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, “Performance Analysis and Optimization of Two-Sided Factorization Algorithms for Heterogeneous Platform,” International Conference on Computational Science (ICCS 2015), Reykjavík, Iceland, June 2015.

(1.12 MB)

Haidar, A., J. Kurzak, G. Pichon, and M. Faverge, “ A Data Flow Divide and Conquer Algorithm for Multicore Architecture,” 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.

(535.44 KB)

Anzt, H., S. Tomov, and J. Dongarra, “Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,” Spring Simulation Multi-Conference 2015 (SpringSim'15), Alexandria, VA, SCS, April 2015.

(1.46 MB)

Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, “Performance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architectures,” The Spring Simulation Multi-Conference 2015 (SpringSim'15), Best Paper Award, Alexandria, VA, April 2015.

(608.44 KB)

Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, “Towards Batched Linear Solvers on Accelerated Hardware Platforms,” 8th Workshop on General Purpose Processing Using GPUs (GPGPU 8) co-located with PPOPP 2015, San Francisco, CA, ACM, February 2015.

(403.74 KB)

Haidar, A., T. Dong, P. Luszczek, S. Tomov, and J. Dongarra, “Optimization for Performance and Energy for Batched Matrix Computations on GPUs,” 8th Workshop on General Purpose Processing Using GPUs (GPGPU 8), San Francisco, CA, ACM, February 2015. DOI: 10.1145/2716282.2716288

(699.5 KB)

Anzt, H., S. Tomov, and J. Dongarra, “Energy Efficiency and Performance Frontiers for Sparse Computations on GPU Supercomputers,” Sixth International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM '15), San Francisco, CA, ACM, February 2015. DOI: 10.1145/2712386.2712387

(2.29 MB)

Yamazaki, I., S. Tomov, and J. Dongarra, “Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, New Orleans, LA, November 2014.

(465.52 KB)

Haidar, A., C. Cao, I. Yamazaki, J. Dongarra, M. Gates, P. Luszczek, and S. Tomov, “Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '14), New Orleans, LA, IEEE, November 2014. DOI: 10.1109/ScalA.2014.8

(407.5 KB)

Yamazaki, I., S. Rajamanickam, E. G. Boman, M. Hoemmen, M. A. Heroux, and S. Tomov, “Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 14), New Orleans, LA, IEEE, November 2014.

Yamazaki, I., T. Mary, J. Kurzak, S. Tomov, and J. Dongarra, “Access-averse Framework for Computing Low-rank Matrix Approximations,” First International Workshop on High Performance Big Graph Data Management, Analysis, and Mining, Washington, DC, October 2014.

Bouteiller, A., T. Herault, and G. Bosilca, “A Multithreaded Communication Substrate for OpenSHMEM,” 8th International Conference on Partitioned Global Address Space Programming Models (PGAS), Eugene, OR, October 2014.

(261.66 KB)

Haugen, B., and J. Kurzak, “Search Space Pruning Constraints Visualization,” VISSOFT'14: 2nd IEEE Working Conference on Software Visualization, Victoria, BC, Canada, IEEE, September 2014.

(1.32 MB)

Dong, T., A. Haidar, S. Tomov, and J. Dongarra, “A Fast Batched Cholesky Factorization on a GPU,” International Conference on Parallel Processing (ICPP-2014), Minneapolis, MN, September 2014.

(1.37 MB)

Dong, T., A. Haidar, P. Luszczek, J. Harris, S. Tomov, and J. Dongarra, “LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU,” 16th IEEE International Conference on High Performance Computing and Communications (HPCC), Paris, France, IEEE, August 2014.

(684.73 KB)

Genet, D., A. Guermouche, and G. Bosilca, “Assembly Operations for Multicore Architectures using Task-Based Runtime Systems,” Euro-Par 2014, Porto, Portugal, Springer International Publishing, August 2014.

(481.52 KB)

Baboulin, M., J. Dongarra, and R. Lacroix, “Computing Least Squares Condition Numbers on Hybrid Multicore/GPU Systems,” International Interdisciplinary Conference on Applied Mathematics, Modeling and Computational Science (AMMCS), Waterloo, Ontario, CA, August 2014.

(130.18 KB)

Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, “Heterogeneous Acceleration for Linear Algebra in Mulit-Coprocessor Environments,” VECPAR 2014, Eugene, OR, June 2014.

(276.52 KB)

Gates, M., A. Haidar, and J. Dongarra, “Accelerating Eigenvector Computation in the Nonsymmetric Eigenvalue Problem,” VECPAR 2014, Eugene, OR, June 2014.

(199.44 KB)

Song, F., and J. Dongarra, “Scaling Up Matrix Computations on Shared-Memory Manycore Systems with 1000 CPU Cores,” International conference on Supercomputing, Munich, Germany, ACM, pp. 333-342, June 2014. DOI: 10.1145/2597652.2597670

(2.9 MB)

Yamazaki, I., H. Anzt, S. Tomov, M. Hoemmen, and J. Dongarra, “Improving the performance of CA-GMRES on multicores with multiple GPUs,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(333.82 KB)

Tomov, S., P. Luszczek, I. Yamazaki, J. Dongarra, H. Anzt, and W. Sawyer, “Optimizing Krylov Subspace Solvers on Graphics Processing Units,” Fourth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(536.32 KB)

Lukarski, D., H. Anzt, S. Tomov, and J. Dongarra, “Hybrid Multi-Elimination ILU Preconditioners on GPUs,” International Heterogeneity in Computing Workshop (HCW), IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(1.67 MB)

Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, “Assessing the Impact of ABFT and Checkpoint Composite Strategies,” 16th Workshop on Advances in Parallel and Distributed Computational Models, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(1.02 MB)

Dong, T., V. Dobrev, T. Kolev, R. Rieben, S. Tomov, and J. Dongarra, “A Step towards Energy Efficient Computing: Redesigning A Hydrodynamic Application on CPU-GPU,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(1.01 MB)

Haidar, A., P. Luszczek, and J. Dongarra, “New Algorithm for Computing Eigenvectors of the Symmetric Eigenvalue Problem,” Workshop on Parallel and Distributed Scientific and Engineering Computing, IPDPS 2014 (Best Paper), Phoenix, AZ, IEEE, May 2014. DOI: 10.1109/IPDPSW.2014.130

(2.33 MB)

Yamazaki, I., J. Kurzak, P. Luszczek, and J. Dongarra, “Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime,” Workshop on Large-Scale Parallel Processing, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(398.16 KB)

Haidar, A., C. Cao, J. Dongarra, P. Luszczek, and S. Tomov, “Unified Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(1.51 MB)

Cao, C., J. Dongarra, P. Du, M. Gates, P. Luszczek, and S. Tomov, “clMAGMA: High Performance Dense Linear Algebra with OpenCL ,” International Workshop on OpenCL, Bristol University, England, May 2014.

(460.91 KB)

Marin, G., J. Dongarra, and D. Terpstra, “MIAMI: A Framework for Application Performance Diagnosis ,” IPASS-2014, Monterey, CA, IEEE, March 2014. DOI: 10.1109/ISPASS.2014.6844480

(1010.75 KB)

Jia, Y., P. Luszczek, G. Bosilca, and J. Dongarra, “CPU-GPU Hybrid Bidiagonal Reduction With Soft Error Resilience,” ScalA '13 Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Montpellier, France, November 2013.

(238.58 KB)

Jia, Y., G. Bosilca, P. Luszczek, and J. Dongarra, “Parallel Reduction to Hessenberg Form with Algorithm-Based Fault Tolerance,” International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE-SC 2013, Denver, CO, November 2013.

(147.09 KB)

Haidar, A., P. Luszczek, J. Kurzak, and J. Dongarra, “An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware,” Supercomputing 2013, Denver, CO, November 2013.

Mattson, T., D. Bader, J. Berry, A. Buluc, J. Dongarra, C. Faloutsos, J. Feo, J. Gilbert, J. Gonzalez, B. Hendrickson, et al., “Standards for Graph Algorithm Primitives,” 17th IEEE High Performance Extreme Computing Conference (HPEC '13), Waltham, MA, IEEE, September 2013. DOI: 10.1109/HPEC.2013.6670338

(108.86 KB)

Turchenko, V., G. Bosilca, A. Bouteiller, and J. Dongarra, “Efficient Parallelization of Batch Pattern Training Algorithm on Many-core and Cluster Architectures,” 7th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems, Berlin, Germany, September 2013.

(102.51 KB)

Dongarra, J., M. Gates, A. Haidar, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov, “Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi,” PPAM 2013, Warsaw, Poland, September 2013.

(284.97 KB)

Bouteiller, A., F. Cappello, J. Dongarra, A. Guermouche, T. Herault, and Y. Robert, “Multi-criteria Checkpointing Strategies: Response-Time versus Resource Utilization,” Euro-Par 2013, Aachen, Germany, Springer, August 2013.

(431.84 KB)

Ma, T., G. Bosilca, A. Bouteiller, and J. Dongarra, “Kernel-assisted and topology-aware MPI collective communications on multi-core/many-core platforms,” Journal of Parallel and Distributed Computing, vol. 73, issue 7, pp. 1000-1010, July 2013. DOI: 10.1016/j.jpdc.2013.01.015

(1.4 MB)

Haidar, A., S. Tomov, J. Dongarra, R. Solcà, and T. C. Schulthess, “Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations,” International Supercomputing Conference (ISC), Lecture Notes in Computer Science, vol. 7905, Leipzig, Germany, Springer Berlin Heidelberg, pp. 67-80, June 2013. DOI: 10.1007/978-3-642-38750-0_6

(2.14 MB)

Haidar, A., M. Gates, S. Tomov, and J. Dongarra, “Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication,” Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013. DOI: 10.1145/2464996.2465438

(1.27 MB)

Marin, G., C. McCurdy, and J. Vetter, “Diagnosis and Optimization of Application Prefetching Performance,” Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013. DOI: 10.1145/2464996.2465014

(827.31 KB)

Wang, Y., M. Baboulin, J. Falcou, Y. Fraigneau, and O. Le Maître, “A Parallel Solver for Incompressible Fluid Flows,” International Conference on Computational Science (ICCS 2013), Barcelona, Spain, Elsevier B.V., June 2013. DOI: DOI: 10.1016/j.procs.2013.05.207

(588.79 KB)

Dongarra, J., T. Herault, and Y. Robert, “Revisiting the Double Checkpointing Algorithm,” 15th Workshop on Advances in Parallel and Distributed Computational Models, at the IEEE International Parallel & Distributed Processing Symposium, Boston, MA, May 2013.

(591.1 KB)

Kurzak, J., P. Luszczek, M. Gates, I. Yamazaki, and J. Dongarra, “Virtual Systolic Array for QR Decomposition,” 15th Workshop on Advances in Parallel and Distributed Computational Models, IEEE International Parallel & Distributed Processing Symposium (IPDPS 2013), Boston, MA, IEEE, May 2013. DOI: 10.1109/IPDPS.2013.119

(749.84 KB)

Yamazaki, I., D. Becker, J. Dongarra, A. Druinsky, I. Peled, S. Toledo, G. Ballard, J. Demmel, and O. Schwartz, “Implementing a Blocked Aasen’s Algorithm with a Dynamic Scheduler on Multicore Architectures,” IPDPS 2013 (submitted), Boston, MA, 00 2013.

(1.22 MB)

Dong, T., T. Kolev, R. Rieben, V. Dobrev, S. Tomov, and J. Dongarra, “Acceleration of the BLAST Hydro Code on GPU,” Supercomputing '12 (poster), Salt Lake City, Utah, SC12, November 2012.

Dongarra, J., H. Ltaeif, P. Luszczek, and V. M. Weaver, “Energy Footprint of Advanced Dense Numerical Linear Algebra using Tile Algorithms on Multicore Architecture,” The 2nd International Conference on Cloud and Green Computing (submitted), Xiangtan, Hunan, China, November 2012.

(329.5 KB)

Solcà, R., A. Haidar, S. Tomov, J. Dongarra, and T. C. Schulthess, “A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks,” Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.

Agullo, E., G. Bosilca, C. Castagnède, J. Dongarra, H. Ltaeif, and S. Tomov, “Matrices Over Runtime Systems at Exascale,” Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.

Luszczek, P., and J. Dongarra, “Anatomy of a Globally Recursive Embedded LINPACK Benchmark,” 2012 IEEE High Performance Extreme Computing Conference, Waltham, MA, pp. 1-6, September 2012. DOI: 10.1109/HPEC.2012.6408679

(204.74 KB)

Ltaeif, H., P. Luszczek, and J. Dongarra, “Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures using Tree Reduction,” Lecture Notes in Computer Science, vol. 7203, pp. 661-670, September 2012.

(185.77 KB)

Bland, W., A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. Dongarra, “An Evaluation of User-Level Failure Mitigation Support in MPI,” Proceedings of Recent Advances in Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, Springer, September 2012.

Bosilca, G., J. Dongarra, and H. Ltaeif, “Power Profiling of Cholesky and QR Factorizations on Distributed Memory Systems,” Third International Conference on Energy-Aware High Performance Computing, Hamburg, Germany, September 2012.

(290.27 KB)

Bland, W., “User Level Failure Mitigation in MPI,” Euro-Par 2012: Parallel Processing Workshops, vol. 7640, Rhodes Island, Greece, Springer Berlin Heidelberg, pp. 499-504, August 2012.

(136.15 KB)

Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, “A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI,” 18th International European Conference on Parallel and Distributed Computing (Euro-Par 2012) (Best Paper Award), Rhodes, Greece, Springer-Verlag, August 2012.

(289.32 KB)

Anzt, H., S. Tomov, J. Dongarra, and V. Heuveline, “Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems,” Tenth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (Best Paper), Rhodes Island, Greece, August 2012.

(764.02 KB)

Abdelfattah, A., J. Dongarra, D. Keyes, and H. Ltaeif, “Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators,” VECPAR 2012, Kobe, Japan, July 2012.

(737.28 KB)

Haidar, A., H. Ltaeif, and J. Dongarra, “Toward High Performance Divide and Conquer Eigensolver for Dense Symmetric Matrices,” SIAM Journal on Scientific Computing (Accepted), July 2012.

Du, P., P. Luszczek, and J. Dongarra, “High Performance Dense Linear System Solver with Resilience to Multiple Soft Errors,” ICCS 2012, Omaha, NE, June 2012.

(1.27 MB)

Dongarra, J., and A. J. van der Steen, “High Performance Computing Systems: Status and Outlook,” Acta Numerica, vol. 21, Cambridge, UK, Cambridge University Press, pp. 379-474, May 2012.

(1.48 MB)

Haidar, A., H. Ltaeif, P. Luszczek, and J. Dongarra, “A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction,” IPDPS 2012, Shanghai, China, May 2012.

(480.43 KB)

Bland, W., “Enabling Application Resilience With and Without the MPI Standard,” 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Ottawa, Canada, May 2012.

(262.93 KB)

Baboulin, M., D. Becker, and J. Dongarra, “A Parallel Tiled Solver for Symmetric Indefinite Systems On Multicore Architectures,” IPDPS 2012, Shanghai, China, May 2012.

(544.09 KB)

Ma, T., G. Bosilca, A. Bouteiller, and J. Dongarra, “HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters,” IPDPS 2012 (Best Paper), Shanghai, China, May 2012.

(165.9 KB)

Anzt, H., J. Dongarra, and V. Heuveline, “Weighted Block-Asynchronous Relaxation for GPU-Accelerated Systems,” SIAM Journal on Computing (submitted), March 2012.

(811.01 KB)

Du, P., A. Bouteiller, G. Bosilca, T. Herault, and J. Dongarra, “Algorithm-Based Fault Tolerance for Dense Matrix Factorization,” Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, ACM, pp. 225-234, February 2012. DOI: 10.1145/2145816.2145845

(865.79 KB)

Becker, D., M. Baboulin, and J. Dongarra, “Reducing the Amount of Pivoting in Symmetric Indefinite Systems,” Parallel Processing and Applied Mathematics, Lecture Notes in Computer Science (PPAM 2011), vol. 7203: Springer-Verlag Berlin Heidelberg, pp. 133-142, 00 2012.

(145.76 KB)

Kurzak, J., R. Nath, P. Du, and J. Dongarra, “An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs,” Applied Parallel and Scientific Computing, vol. 7133, pp. 248-257, 00 2012.

(623.5 KB)

Dongarra, J., and P. Luszczek, “HPC Challenge: Design, History, and Implementation Highlights,” On the Road to Exascale Computing: Contemporary Architectures in High Performance Computing (to appear): Chapman & Hall/CRC Press, 00 2012.

(469.92 KB)

“Recent Advances in the Message Passing Interface: 19th European MPI Users' Group Meeting, EuroMPI 2012,” Lecture Notes in Computer Science, vol. 7490, Vienna, Austria, 00 2012.

Dongarra, J., J. Kurzak, P. Luszczek, and S. Tomov, “Dense Linear Algebra on Accelerated Multicore Hardware,” High Performance Scientific Computing: Algorithms and Applications, London, UK, Springer-Verlag, 00 2012.

Kurzak, J., P. Luszczek, S. Tomov, and J. Dongarra, “Preliminary Results of Autotuning GEMM Kernels for the NVIDIA Kepler Architecture,” LAWN 267, 00 2012.

(1.14 MB)

Luszczek, P., J. Kurzak, and J. Dongarra, “Looking Back at Dense Linear Algebra Software,” Perspectives on Parallel and Distributed Processing: Looking Back and What's Ahead (to appear), 00 2012.

(235.91 KB)

“Parallel Processing and Applied Mathematics, 9th International Conference, PPAM 2011,” Lecture Notes in Computer Science, vol. 7203, Torun, Poland, 00 2012.

Du, P., P. Luszczek, S. Tomov, and J. Dongarra, “Soft Error Resilient QR Factorization for Hybrid System with GPGPU,” Journal of Computational Science, Seattle, WA, Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems at SC11, November 2011.

(965.88 KB)

Dongarra, J., M. Faverge, H. Ltaeif, and P. Luszczek, “High Performance Matrix Inversion Based on LU Factorization for Multicore Architectures,” Proceedings of MTAGS11, Seattle, WA, November 2011.

(879.49 KB)

Ltaeif, H., P. Luszczek, and J. Dongarra, “Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power and Energy Efficiency,” International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011.

(1.27 MB)

Chaarawi, M., E. Gabriel, R. Keller, R. L. Graham, G. Bosilca, and J. Dongarra, “OMPIO: A Modular Software Architecture for MPI I/O,” 18th EuroMPI, Santorini, Greece, Springer, pp. 81-89, September 2011.

Du, P., P. Luszczek, and J. Dongarra, “High Performance Dense Linear System Solver with Soft Error Resilience,” IEEE Cluster 2011, Austin, TX, September 2011.

(1.27 MB)

Bosilca, G., T. Herault, P. Lemariner, J. Dongarra, and A. Rezmerita, “Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure,” Proceedings of Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting, EuroMPI 2011, vol. 6960, Santorini, Greece, Springer, pp. 342-344, September 2011.

(115.75 KB)

Bosilca, G., T. Herault, A. Rezmerita, and J. Dongarra, “On Scalability for MPI Runtime Systems,” International Conference on Cluster Computing (CLUSTER), Austin, TX, USA, IEEEE, pp. 187-195, September 2011.

(898.76 KB)

Ma, T., G. Bosilca, A. Bouteiller, B. Goglin, J. Squyres, and J. Dongarra, “Kernel Assisted Collective Intra-node MPI Communication Among Multi-core and Many-core CPUs,” Int'l Conference on Parallel Processing (ICPP '11), Taipei, Taiwan, September 2011.

Lively, C., X. Wu, V. Taylor, S. Moore, H-C. Chang, C-Y. Su, and K. Cameron, “Power-Aware Prediction Models of Hybrid (MPI/OpenMP) Scientific Applications,” International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011.

(479.49 KB)

Coulomb, K., A. Degomme, M. Faverge, and F. Trahay, “An open-source tool-chain for performance analysis,” Parallel Tools Workshop, Dresden, Germany, September 2011.

(622.1 KB)

Ma, T., A. Bouteiller, G. Bosilca, and J. Dongarra, “Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW,” 18th EuroMPI, Santorini, Greece, Springer, pp. 247-254, September 2011.

Bouteiller, A., T. Herault, G. Bosilca, and J. Dongarra, “Correlated Set Coordination in Fault Tolerant Message Logging Protocols,” Proceedings of 17th International Conference, Euro-Par 2011, Part II, vol. 6853, Bordeaux, France, Springer, pp. 51-64, August 2011.

(486.68 KB)

Luszczek, P., E. Meek, S. Moore, D. Terpstra, V. M. Weaver, and J. Dongarra, “Evaluation of the HPC Challenge Benchmarks in Virtualized Environments,” 6th Workshop on Virtualization in High-Performance Cloud Computing, Bordeaux, France, August 2011.

(114.73 KB)

Baboulin, M., J. Dongarra, J. Herrmann, and S. Tomov, “Accelerating Linear System Solutions Using Randomization Techniques,” INRIA RR-7616 / LAWN #246 (presented at International AMMCS’11), Waterloo, Ontario, Canada, July 2011.

(358.79 KB)

You, H., B. Rekapalli, Q. Liu, and S. Moore, “Autotuned Parallel I/O for Highly Scalable Biosequence Analysis,” TeraGrid'11, Salt Lake City, Utah, July 2011.

(275.34 KB)

Danalis, A., P. Luszczek, G. Marin, J. Vetter, and J. Dongarra, “BlackjackBench: Hardware Characterization with Portable Micro-Benchmarks and Automatic Statistical Analysis of Results,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.

You, H., Q. Liu, Z. Li, and S. Moore, “The Design of an Auto-tuning I/O Framework on Cray XT5 System,” Cray Users Group Conference (CUG'11) (Best Paper Finalist), Fairbanks, Alaska, May 2011.

(459.57 KB)

Luszczek, P., H. Ltaeif, and J. Dongarra, “Two-stage Tridiagonal Reduction for Dense Symmetric Matrices using Tile Algorithms on Multicore Architectures,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.

Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, N. Ohm Saengpatsa, S. Tomov, and J. Dongarra, “A Unified HPC Environment for Hybrid Manycore/GPU Distributed Systems,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.

White, J. B., and J. Dongarra, “Overlapping Computation and Communication for Advection on a Hybrid Parallel Computer,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.

Agullo, E., L. Giraud, A. Guermouche, A. Haidar, S. Lanteri, and J. Roman, “Algebraic Schwarz Preconditioning for the Schur Complement: Application to the Time-Harmonic Maxwell Equations Discretized by a Discontinuous Galerkin Method.,” The Twentieth International Conference on Domain Decomposition Methods, La Jolla, California, February 2011.

Sourbier, F., A. Haidar, L. Giraud, H. Ben-Hadj-Ali, S. Operto, and J. Virieux, “Three-dimensional parallel frequency-domain visco-acoustic wave modelling based on a hybrid direct/iterative solver.,” To appear in Geophysical Prospecting journal., 00 2011.

(1.04 MB)

Haidar, A., L. Giraud, H. Ben-Hadj-Ali, F. Sourbier, S. Operto, and J. Virieux, “3-D parallel frequency-domain visco-acoustic wave modelling based on a hybrid direct/iterative solver,” 73rd EAGE Conference & Exhibition incorporating SPE EUROPEC 2011, Vienna, Austria, 23-26 May, 00 2011.

Haidar, A., H. Ltaeif, and J. Dongarra, “Toward High Performance Divide and Conquer Eigensolver for Dense Symmetric Matrices.,” Submitted to SIAM Journal on Scientific Computing (SISC), 00 2011.

Ma, T., T. Herault, G. Bosilca, and J. Dongarra, “Process Distance-aware Adaptive MPI Collective Communications,” IEEE Int'l Conference on Cluster Computing (Cluster 2011), Austin, Texas, 00 2011.

Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Lemariner, and J. Dongarra, “DAGuE: A Generic Distributed DAG Engine for High Performance Computing,” Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1151-1158, 00 2011.

(830.85 KB)

Agullo, E., L. Giraud, A. Guermouche, A. Haidar, and J. Roman, “Parallel algebraic domain decomposition solver for the solution of augmented systems.,” Parallel, Distributed, Grid and Cloud Computing for Engineering, Ajaccio, Corsica, France, 12-15 April, 00 2011.

Luszczek, P., J. Kurzak, and J. Dongarra, “Changes in Dense Linear Algebra Kernels - Decades Long Perspective,” in Solving the Schrodinger Equation: Has everything been tried? (to appear): Imperial College Press, 00 2011.

Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, and J. Dongarra, “Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols,” Proceedings of EuroMPI 2010, Stuttgart, Germany, Springer, September 2010.

(202.87 KB)

“Recent Advances in the Message Passing Interface, Lecture Notes in Computer Science (LNCS),” EuroMPI 2010 Proceedings, vol. 6305, Stuttgart, Germany, Springer, September 2010.

Ma, T., A. Bouteiller, G. Bosilca, and J. Dongarra, “Locality and Topology aware Intra-node Communication Among Multicore CPUs,” Proceedings of the 17th EuroMPI conference, Stuttgart, Germany, LNCS, September 2010.

(327.01 KB)

“8th International Conference on Parallel Processing and Applied Mathematics, Lecture Notes in Computer Science (LNCS),” PPAM 2009 Proceedings, vol. 6067, Wroclaw, Poland, Springer, September 2010.

Agullo, E., L. Giraud, A. Guermouche, A. Haidar, J. Roman, and Y. Lee-Tin-Yien, “MaPHyS or the Development of a Parallel Algebraic Domain Decomposition Solver in the Course of the Solstice Project,” Sparse Days 2010 Meeting at CERFACS, Toulouse, France, June 2010.

Bouteiller, A., G. Bosilca, and J. Dongarra, “Redesigning the Message Logging Model for High Performance,” Concurrency and Computation: Practice and Experience (online version), June 2010.

(438.42 KB)

Bernholc, J., M. Hodak, W. Lu, S. Moore, and S. Tomov, “Scalability Study of a Quantum Simulation Code,” PARA 2010, Reykjavik, Iceland, June 2010.

Hurault, A., and A. YarKhan, “Intelligent Service Trading and Brokering for Distributed Network Services in GridSolve,” VECPAR 2010, 9th International Meeting on High Performance Computing for Computational Science, Berkeley, CA, June 2010.

(256.04 KB)

Turchenko, V., L. Grandinetti, G. Bosilca, and J. Dongarra, “Improvement of parallelization efficiency of batch pattern BP training algorithm using Open MPI,” Proceedings of International Conference on Computational Science, ICCS 2010 (to appear), Amsterdam The Netherlands, Elsevier, June 2010.

(125.01 KB)

Agullo, E., L. Giraud, A. Guermouche, A. Haidar, and J. Roman, “Towards a Complexity Analysis of Sparse Hybrid Linear Solvers,” PARA 2010, Reykjavik, Iceland, June 2010.

Dongarra, J., “LINPACK on Future Manycore and GPu Based Systems,” PARA 2010, Reykjavik, Iceland, June 2010.

Tomov, S., W. Lu, J. Bernholc, S. Moore, and J. Dongarra, “Performance Evaluation for Petascale Quantum Simulation Tools,” Proceedings of the Cray Users' Group Meeting, Atlanta, GA, May 2010.

“Proceedings of the International Conference on Computational Science,” ICCS 2010, Amsterdam, Elsevier, May 2010.

Ltaeif, H., J. Kurzak, and J. Dongarra, “Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures,” IEEE Transactions on Parallel and Distributed Systems, pp. 417-423, April 2010.

(208.16 KB)

Agullo, E., C. Coti, J. Dongarra, T. Herault, and J. Langou, “QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment,” 24th IEEE International Parallel and Distributed Processing Symposium (also LAWN 224), Atlanta, GA, April 2010.

(261.55 KB)

Brady, T., A. Lastovetsky, K. Seymour, M. Guidolin, and J. Dongarra, “SmartGridRPC: The new RPC model for high performance Grid Computing and Its Implementation in SmartGridSolve,” Concurrency and Computation: Practice and Experience (to appear), January 2010.

(1.08 MB)

Jagode, H., A. Knuepfer, J. Dongarra, M. Jurenz, M. S. Mueller, and W. E. Nagel, “Trace-based Performance Analysis for the Petascale Simulation Code FLASH,” International Journal of High Performance Computing Applications (to appear), 00 2010.

(887.54 KB)

Gustavson, F. G., J. Wasniewski, and J. Dongarra, “Level-3 Cholesky Kernel Subroutine of a Fully Portable High Performance Minimal Storage Hybrid Format Cholesky Algorithm,” ACM TOMS (submitted), also LAPACK Working Note (LAWN) 211, 00 2010.

(190.2 KB)

Hadri, B., E. Agullo, and J. Dongarra, “Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,” 24th IEEE International Parallel and Distributed Processing Symposium (submitted), 00 2010.

(313.98 KB)

Dongarra, J., and S. Moore, “Empirical Performance Tuning of Dense Linear Algebra Software,” in Performance Tuning of Scientific Applications (to appear), 00 2010.

Buttari, A., J. Langou, J. Kurzak, and J. Dongarra, “A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,” Parallel Computing (to appear), 00 2010.

(612.23 KB)

Lemariner, P., G. Bosilca, C. Coti, T. Herault, and J. Dongarra, “Constructing Resilient Communication Infrastructure for Runtime Environments,” ParCo 2009, Lyon France, September 2009.

Kurzak, J., H. Ltaeif, J. Dongarra, and R. M. Badia, “Dependency-Driven Scheduling of Dense Matrix Factorizations on Shared-Memory Systems,” PPAM 2009, Poland, September 2009.

Song, F., S. Moore, and J. Dongarra, “Analytical Modeling and Optimization for Affinity Based Thread Scheduling on Multicore Systems,” IEEE Cluster 2009, New Orleans, August 2009.

(395.53 KB)

Bouteiller, A., T. Ropars, G. Bosilca, C. Morin, and J. Dongarra, “Reasons for a Pessimistic or Optimistic Message Logging Protocol in MPI Uncoordinated Failure Recovery,” CLUSTER '09, New Orleans, IEEE, August 2009. DOI: 10.1109/CLUSTR.2009.5289157

(191.36 KB)

Alam, S., R. F. Barrett, H. Jagode, J. A. Kuehn, S. W. Poole, and R. Sankaran, “Impact of Quad-core Cray XT4 System and Software Stack on Scientific Computation,” Euro-Par 2009, Lecture Notes in Computer Science, vol. 5704/2009, Delft, The Netherlands, Springer Berlin / Heidelberg, pp. 334-344, August 2009.

(312.74 KB)

Supinski, B. R. de, S. Alam, D. Bailey, L. Carrington, C. Daley, A. Dubey, T. Gamblin, D. Gunter, P. D. Hovland, H. Jagode, et al., “Modeling the Office of Science Ten Year Facilities Plan: The PERI Architecture Tiger Team,” SciDAC 2009, Journal of Physics: Conference Series, vol. 180(2009)012039, San Diego, California, IOP Publishing, July 2009.

(906.39 KB)

Dongarra, J., P. Beckman, P. Aerts, F. Cappello, T. Lippert, S. Matsuoka, P. Messina, T. Moore, R. Stevens, A. Trefethen, et al., “The International Exascale Software Project: A Call to Cooperative Action by the Global High Performance Community,” International Journal of High Performance Computing Applications (to appear), July 2009.

(203.04 KB)

Jagode, H., S. Moore, D. Terpstra, J. Dongarra, A. Knuepfer, M. Jurenz, M. S. Mueller, and W. E. Nagel, “I/O Performance Analysis for the Petascale Simulation Code FLASH,” ISC'09, Hamburg, Germany, June 2009.

(88.88 KB)

Danalis, A., L. Pollock, M. Swany, and J. Cavazos, “MPI-aware Compiler Optimizations for Improving Communication-Computation Overlap,” Proceedings of the 23rd annual International Conference on Supercomputing (ICS '09), Yorktown Heights, NY, USA, ACM, pp. 316-325, June 2009.

(308.92 KB)

Portillo, R., P. J. Teller, D. Cronk, and S. Moore, “Making Performance Analysis and Tuning Part of the Software Development Cycle,” Proceedings of DoD HPCMP UGC 2009, San Diego, CA, IEEE, June 2009.

Jagode, H., J. Dongarra, S. Alam, J. Vetter, W. Spear, and A. D. Malony, “A Holistic Approach for Performance Measurement and Analysis for Petascale Applications,” ICCS 2009 Joint Workshop: Tools for Program Development and Analysis in Computational Science and Software Engineering for Large-Scale Computing, vol. 2009, Baton Rouge, Louisiana, Springer-Verlag Berlin Heidelberg 2009, pp. 686-695, May 2009.

(3.96 MB)

(208.16 KB)

Tomov, S., W. Lu, J. Bernholc, S. Moore, and J. Dongarra, “Performance evaluation for petascale quantum simulation tools,” Proceedings of CUG09, Atlanta, GA, May 2009.

(1.09 MB)

Dai, Y-S., and J. Dongarra, “Reliability and Performance Modeling and Analysis for Grid Computing,” in Handbook of Research on Scalable Computing Technologies (to appear): IGI Global, pp. 219-245, 00 2009.

(200.57 KB)

Demmel, J., J. Dongarra, A. Fox, S. Williams, V. Volkov, and K. Yelick, “Accelerating Time-To-Solution for Computational Science and Engineering,” SciDAC Review, 00 2009.

(739.11 KB)

Ramakrishan, L., D. Nurmi, A. Mandal, C. Koelbel, D. Gannon, M. Huang, Y-S. Kee, G. Obertelli, K. Thyagaraja, R. Wolski, et al., “VGrADS: Enabling e-Science Workflows on Grids and Clouds with Fault Tolerance,” SC’09 The International Conference for High Performance Computing, Networking, Storage and Analysis (to appear), Portland, OR, 00 2009.

(648.82 KB)

Kurzak, J., H. Ltaeif, J. Dongarra, and R. M. Badia, “Scheduling Linear Algebra Operations on Multicore Processors,” Concurrency Practice and Experience (to appear), 00 2009.

(716.18 KB)

Fürlinger, K., and S. Moore, “Recording the Control Flow of Parallel Applications to Determine Iterative and Phase-Based Behavior,” Future Generation Computing Systems, vol. 26, pp. 162-166, 00 2009.

Seymour, K., A. YarKhan, and J. Dongarra, “Transparent Cross-Platform Access to Software Services using GridSolve and GridRPC,” in Cloud Computing and Software Services: Theory and Techniques (to appear): CRC Press, 00 2009.

Hoefler, T., Y-S. Dai, and J. Dongarra, “Towards Efficient MapReduce Using MPI,” Lecture Notes in Computer Science, Recent Advances in Parallel Virtual Machine and Message Passing Interface - 16th European PVM/MPI Users' Group Meeting, vol. 5759, Espoo, Finland, Springer Berlin / Heidelberg, pp. 240-249, 00 2009.

Agullo, E., B. Hadri, H. Ltaeif, and J. Dongarra, “Comparative Study of One-Sided Factorizations with Multiple Software Packages on Multi-Core Hardware,” 2009 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '09) (to appear), 00 2009.

(515.63 KB)

Gustavson, F. G., J. Wasniewski, J. Dongarra, and J. Langou, “Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,” ACM TOMS (to appear), 00 2009.

(896.03 KB)

Alvaro, W., J. Kurzak, and J. Dongarra, “Optimizing Matrix Multiplication for a Short-Vector SIMD Architecture - CELL Processor,” Parallel Computing, vol. 35, pp. 138-150, 00 2009.

(591.16 KB)

Dongarra, J., H. Meuer, H. D. Simon, and E. Strohmaier, “Recent Trends in High Performance Computing,” in Birth of Numerical Analysis (to appear), 00 2009.

Kurzak, J., and J. Dongarra, “QR Factorization for the CELL Processor,” Scientific Programming (to appear), 00 2009.

(234.02 KB)

Dongarra, J., G. Bosilca, R. Delmas, and J. Langou, “Algorithmic Based Fault Tolerance Applied to High Performance Computing,” Journal of Parallel and Distributed Computing, vol. 69, pp. 410-416, 00 2009.

(313.55 KB)

Seymour, K., H. You, and J. Dongarra, “A Comparison of Search Heuristics for Empirical Code Optimization,” The 3rd international Workshop on Automatic Performance Tuning, Tsukuba, Japan, October 2008.

(772.48 KB)

Li, Y., J. Dongarra, K. Seymour, and A. YarKhan, “Request Sequencing: Enabling Workflow for Efficient Problem Solving in GridSolve,” International Conference on Grid and Cooperative Computing (GCC 2008) (submitted), Shenzhen, China, October 2008.

(1.64 MB)

Youseff, L., K. Seymour, H. You, J. Dongarra, and R. Wolski, “The Impact of Paravirtualized Memory Hierarchy on Linear Algebra Computational Kernels and Software,” ACM/IEEE International Symposium on High Performance Distributed Computing, Boston, MA., June 2008.

(403.89 KB)

Canning, A., J. Dongarra, J. Langou, O. Marques, S. Tomov, C. Voemel, and L-W. Wang, “Interior State Computation of Nano Structures,” PARA 2008, 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing, Trondheim, Norway, May 2008.

(137.12 KB)

Fürlinger, K., and S. Moore, “OpenMP-centric Performance Analysis of Hybrid Applications,” Proc. 2008 IEEE International Conference on Cluster Computing (CLUSTER 2008), Tsukuba, Japan, January 2008.

(218.63 KB)

Dongarra, J., R. Graybill, W. Harrod, R. Lucas, E. Lusk, P. Luszczek, J. McMahon, A. Snavely, J. Vetter, K. Yelick, et al., “DARPA's HPCS Program: History, Models, Tools, Languages,” in Advances in Computers, vol. 72: Elsevier, January 2008.

(3.61 MB)

Kurzak, J., A. Buttari, P. Luszczek, and J. Dongarra, “The PlayStation 3 for High Performance Scientific Computing,” Computing in Science and Engineering, pp. 80-83, January 2008.

(2.45 MB)

Dongarra, J., J-F. Pineau, Y. Robert, and F. Vivien, “Matrix Product on Heterogeneous Master Worker Platforms,” 2008 PPoPP Conference, Salt Lake City, Utah, January 2008.

Dongarra, J., “A Tribute to Gene Golub,” Computing in Science and Engineering: IEEE, pp. 5, January 2008.

Fürlinger, K., and S. Moore, “Visualizing the Program Execution Control Flow of OpenMP Applications,” Proc. 4th International Workshop on OpenMP (IWOMP 2008), West Lafayette, Indiana, Lecture Notes in Computer Science 5004, pp. 181-190, January 2008.

(194.25 KB)

“,” 7th International parallel Processing and Applied Mathematics Conference, Lecture Notes in Comptuer Science, vol. 4967, Gdansk, Poland, Springer Berlin, January 2008.

Wolf, F., B. Wylie, E. Abraham, W. Frings, K. Fürlinger, M. Geimer, M-A. Hermanns, B. Mohr, S. Moore, and M. Pfeifer, “Usage of the Scalasca Toolset for Scalable Performance Analysis of Large-scale Parallel Applications,” Proceedings of the 2nd International Workshop on Tools for High Performance Computing, Stuttgart, Germany, Springer, pp. 157-167, January 2008.

(229.2 KB)

Bouteiller, A., G. Bosilca, and J. Dongarra, “Redesigning the Message Logging Model for High Performance,” International Supercomputer Conference (ISC 2008), Dresden, Germany, January 2008.

(622.1 KB)

Buttari, A., J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, and S. Tomov, “Exploiting Mixed Precision Floating Point Hardware in Scientific Computations,” in High Performance Computing and Grids in Action, Amsterdam, IOS Press, January 2008.

(92.95 KB)

Jagode, H., and J. Hein, “Custom assignment of MPI ranks for parallel multi-dimensional FFTs: Evaluation of BG/P versus BG/L,” Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA-08), Sydney, Australia, IEEE Computer Society, pp. 271-283, January 2008.

(2.6 MB)

“,” 8th International Conference on Computational Science (ICCS), Proceedings Parts I, II, and III, Lecture Notes in Computer Science, vol. 5101, Krakow, Poland, Springer Berlin, January 2008.

Buttari, A., J. Langou, J. Kurzak, and J. Dongarra, “Parallel Tiled QR Factorization for Multicore Architectures,” Concurrency and Computation: Practice and Experience, vol. 20, pp. 1573-1590, January 2008.

(277.92 KB)

Bouteiller, A., and F. Desprez, “Fault Tolerance Management for a Hierarchical GridRPC Middleware,” 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), Lyon, France, January 2008.

(319.79 KB)

Baboulin, M., J. Dongarra, S. Gratton, and J. Langou, “Computing the Conditioning of the Components of a Linear Least Squares Solution,” VECPAR '08, High Performance Computing for Computational Science, Toulouse, France, January 2008.

(374.97 KB)

Dongarra, J., and P. Luszczek, “How Elegant Code Evolves With Hardware: The Case Of Gaussian Elimination,” in Beautiful Code Leading Programmers Explain How They Think (Chapter 14), pp. 243-282, January 2008.

(257 KB)

Bailey, D., J. Chame, C. Chen, J. Dongarra, M. Hall, J. K. Hollingsworth, P. D. Hovland, S. Moore, K. Seymour, J. Shin, et al., “PERI Auto-tuning,” Proc. SciDAC 2008, vol. 125, Seatlle, Washington, Journal of Physics, January 2008.

(873.75 KB)

“,” 15th European PVM/MPI Users' Group Meeting, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science, vol. 5205, Dublin Ireland, Springer Berlin, January 2008.

Fürlinger, K., and S. Moore, “Detection and Analysis of Iterative Behavior in Parallel Applications,” Proceedings of the 2008 International Conference on Computational Science (ICCS 2008), vol. 5103, Krakow, Poland, pp. 261-267, January 2008.

(141.02 KB)

Caniou, Y., E. Caron, F. Desprez, H. Nakada, Y. Tanaka, and K. Seymour, “High Performance GridRPC Middleware,” Recent developments in Grid Technology and Applications: Nova Science Publishers, 00 2008.

(923.06 KB)

Hernandez, O., F. Song, B. Chapman, J. Dongarra, B. Mohr, S. Moore, and F. Wolf, “Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications,” Lecture Notes in Computer Science, OpenMP Shared Memory Parallel Programming, vol. 4315: Springer Berlin / Heidelberg, 00 2008.

(350.9 KB)

Dongarra, J., P. Luszczek, and A. Petitet, “The LINPACK Benchmark: Past, Present, and Future,” Concurrency: Practice and Experience, vol. 15, pp. 803-820, 00 2008.

(94.86 KB)

Angskun, T., G. Bosilca, B. Vander Zanden, and J. Dongarra, “Optimal Routing in Binomial Graph Networks,” The International Conference on Parallel and Distributed Computing, applications and Technologies (PDCAT), Adelaide, Australia, IEEE Computer Society, December 2007.

Angskun, T., G. Bosilca, and J. Dongarra, “Self-Healing in Binomial Graph Networks,” 2nd International Workshop On Reliability in Decentralized Distributed Systems (RDDS 2007), Vilamoura, Algarve, Portugal, November 2007.

(322.39 KB)

Graham, R. L., R. Brightwell, B. Barrett, G. Bosilca, and J. Pjesivac–Grbovic, “An Evaluation of Open MPI's Matching Transport Layer on the Cray XT,” EuroPVM/MPI 2007, September 2007.

(369.01 KB)

Bouteiller, A., G. Bosilca, and J. Dongarra, “Retrospect: Deterministic Relay of MPI Applications for Interactive Distributed Debugging,” Accepted for Euro PVM/MPI 2007: Springer, September 2007.

Dongarra, J., G. H. Golub, C. Moler, and K. Moore, “Netlib and NA-Net: building a scientific computing community,” In IEEE Annals of the History of Computing (to appear), August 2007.

(352.71 KB)

Angskun, T., G. Bosilca, and J. Dongarra, “Binomial Graph: A Scalable and Fault- Tolerant Logical Network Topology,” Proceedings of The Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07), Niagara Falls, Canada, Springer, August 2007.

(480.47 KB)

Buttari, A., J. Dongarra, J. Langou, J. Langou, P. Luszczek, and J. Kurzak, “Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems,” International Journal of High Performance Computer Applications (to appear), August 2007.

(157.4 KB)

Pjesivac–Grbovic, J., G. Bosilca, G. Fagg, T. Angskun, and J. Dongarra, “Decision Trees and MPI Collective Algorithm Selection Problem,” Euro-Par 2007, Rennes, France, Springer, pp. 105–115, August 2007.

(552.94 KB)

Dongarra, J., and P. Luszczek, “How Elegant Code Evolves With Hardware: The Case Of Gaussian Elimination,” in Beautiful Code Leading Programmers Explain How They Think: O'Reilly Media, Inc., June 2007.

(257 KB)

Song, F., S. Moore, and J. Dongarra, “Feedback-Directed Thread Scheduling with Memory Considerations,” IEEE International Symposium on High Performance Distributed Computing, Monterey Bay, CA, June 2007.

(297.24 KB)

Mellor-Crummey, J., P. Beckman, J. Dongarra, B. Miller, and K. Yelick, “Creating Software Technology to Harness the Power of Leadership-class Computing Systems,” DOE SciDAC Review (to appear), June 2007.

(617.02 KB)

Dongarra, J., E. Jeannot, E. Saule, and Z. Shi, “Bi-objective Scheduling Algorithms for Optimizing Makespan and Reliability on Heterogeneous Systems,” 19th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) (submitted), San Diego, CA, June 2007.

(223.82 KB)

Graham, R. L., G. Bosilca, and J. Pjesivac–Grbovic, “A Comparison of Application Performance Using Open MPI and Cray MPI,” Cray User Group, CUG 2007, May 2007.

(248.83 KB)

Angskun, T., G. Bosilca, G. Fagg, J. Pjesivac–Grbovic, and J. Dongarra, “Reliability Analysis of Self-Healing Network using Discrete-Event Simulation,” Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07): IEEE Computer Society, pp. 437-444, May 2007.

Langou, J., Z. Chen, G. Bosilca, and J. Dongarra, “Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,” SIAM SISC (to appear), May 2007.

(241.36 KB)

Chen, Z., M. Yang, G. Francia, III, and J. Dongarra, “Self Adapting Application Level Fault Tolerance for Parallel and Distributed Computing,” Proceedings of Workshop on Self Adapting Application Level Fault Tolerance for Parallel and Distributed Computing at IPDPS, pp. 1-8, March 2007.

(162.47 KB)

Shende, S., A. D. Malony, S. Moore, and D. Cronk, “Memory Leak Detection in Fortran Applications using TAU,” Proc. DoD HPCMP Users Group Conference (HPCMP-UGC'07), Pittsburgh, PA, IEEE Computer Society, January 2007.

Fürlinger, K., J. Dongarra, and M. Gerndt, “On Using Incremental Profiling for the Performance Analysis of Shared Memory Parallel Applications,” Proceedings of the 13th International Euro-Par Conference on Parallel Processing (Euro-Par '07), Rennes, France, Springer LNCS, January 2007.

Song, F., S. Moore, and J. Dongarra, “L2 Cache Modeling for Scientific Applications on Chip Multi-Processors,” Proceedings of the 2007 International Conference on Parallel Processing, Xi'an, China, IEEE Computer Society, January 2007.

(654.11 KB)

Fürlinger, K., and S. Moore, “Continuous Runtime Profiling of OpenMP Applications,” Proceedings of the 2007 Conference on Parallel Computing (PARCO 2007), Juelich and Aachen, Germany, January 2007.

(408.01 KB)

Voemel, C., S. Tomov, L-W. Wang, O. Marques, and J. Dongarra, “The Use of Bulk States to Accelerate the Band Edge State Calculation of a Semiconductor Quantum Dot,” Journal of Computational Physics, vol. 223, pp. 774-782, 00 2007.

(452.6 KB)

YarKhan, A., J. Dongarra, and K. Seymour, “GridSolve: The Evolution of Network Enabled Solver,” Grid-Based Problem Solving Environments: IFIP TC2/WG 2.5 Working Conference on Grid-Based Problem Solving Environments (Prescott, AZ, July 2006): Springer, pp. 215-226, 00 2007.

(377.48 KB)

Dongarra, J., Z. Chen, G. Bosilca, and J. Langou, “Disaster Survival Guide in Petascale Computing: An Algorithmic Approach,” in Petascale Computing: Algorithms and Applications (to appear): Chapman & Hall - CRC Press, 00 2007.

(260.18 KB)

Pjesivac–Grbovic, J., G. Bosilca, G. Fagg, T. Angskun, and J. Dongarra, “MPI Collective Algorithm Selection and Quadtree Encoding,” Parallel Computing (Special Edition: EuroPVM/MPI 2006): Elsevier, 00 2007.

(308.39 KB)

Buttari, A., J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, and S. Tomov, “Exploiting Mixed Precision Floating Point Hardware in Scientific Computations,” In High Performance Computing and Grids in Action (to appear), Amsterdam, IOS Press, 00 2007.

(122.01 KB)

(248.66 KB)

Luszczek, P., D. Bailey, J. Dongarra, J. Kepner, R. Lucas, R. Rabenseifner, and D. Takahashi, “The HPC Challenge (HPCC) Benchmark Suite,” SC06 Conference Tutorial, Tampa, Florida, IEEE, November 2006.

(1.08 MB)

Keller, R., G. Bosilca, G. Fagg, M. Resch, and J. Dongarra, “Implementation and Usage of the PERUSE-Interface in Open MPI,” Euro PVM/MPI 2006, Bonn, Germany, September 2006.

(310.76 KB)

Shipman, G. M., G. Bosilca, and A. B. Maccabe, “High Performance RDMA Protocols in HPC,” Euro PVM/MPI 2006, Bonn, Germany, September 2006.

(1.06 MB)

Graham, R. L., G. M. Shipman, B. Barrett, R. Castain, G. Bosilca, and A. Lumsdaine, “A High-Performance, Heterogeneous MPI,” HeteroPar 2006, Barcelona, Spain, September 2006.

(193.73 KB)

Demmel, J., J. Dongarra, B. Parlett, W. Kahan, M. Gu, D. Bindel, Y. Hida, X. Li, O. Marques, J. E. Riedy, et al., “Prospectus for the Next LAPACK and ScaLAPACK Libraries,” PARA 2006, Umea, Sweden, June 2006.

(460.11 KB)

Zunger, A., A. Franceschetti, G. Bester, W. B. Jones, K. Kim, P. A. Graf, L-W. Wang, A. Canning, O. Marques, C. Voemel, et al., “Predicting the electronic properties of 3D, million-atom semiconductor nanostructure architectures,” J. Phys.: Conf. Ser. 46, vol. :101088/1742-6596/46/1/040, pp. 292-298, January 2006.

(644.1 KB)

Chen, Z., and J. Dongarra, “Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources,” IPDPS 2006, 20th IEEE International Parallel and Distributed Processing Symposium, Rhodes Island, Greece, January 2006.

(266.54 KB)

Canning, A., J. Dongarra, J. Langou, O. Marques, S. Tomov, C. Voemel, and L-W. Wang, “Towards bulk based preconditioning for quantum dot computations,” IEEE/ACM Proceedings of HPCNano SC06 (to appear), January 2006.

(172.46 KB)

Fagg, G., J. Pjesivac–Grbovic, G. Bosilca, T. Angskun, and J. Dongarra, “Flexible collective communication tuning architecture applied to Open MPI,” 2006 Euro PVM/MPI (submitted), Bonn, Germany, January 2006.

(206.58 KB)

Hernandez, O., F. Song, B. Chapman, J. Dongarra, B. Mohr, S. Moore, and F. Wolf, “Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications,” Second International Workshop on OpenMP, Reims, France, January 2006.

(350.9 KB)

Canning, A., J. Dongarra, J. Langou, O. Marques, S. Tomov, C. Voemel, and L-W. Wang, “Performance evaluation of eigensolvers in nano-structure computations,” IEEE/ACM Proceedings of HPCNano SC06 (to appear), January 2006.

(120.61 KB)

Song, F., J. Dongarra, and S. Moore, “Experiments with Strassen's Algorithm: From Sequential to Parallel,” 18th IASTED International Conference on Parallel and Distributed Computing and Systems PDCS 2006 (submitted), Dallas, Texas, January 2006.

(514.33 KB)

Voemel, C., S. Tomov, L-W. Wang, O. Marques, and J. Dongarra, “The use of bulk states to accelerate the band edge state calculation of a semiconductor quantum dot,” Journal of Computational Physics (submitted), January 2006.

(337.08 KB)

Tang, Y., G. Fagg, and J. Dongarra, “Proposal of MPI operation level Checkpoint/Rollback and one implementation,” Proceedings of IEEE CCGrid 2006: IEEE Computer Society, January 2006.

(277.27 KB)

Angskun, T., G. Fagg, G. Bosilca, J. Pjesivac–Grbovic, and J. Dongarra, “Self-Healing Network for Scalable Fault Tolerant Runtime Environments,” DAPSYS 2006, 6th Austrian-Hungarian Workshop on Distributed and Parallel Systems, Innsbruck, Austria, January 2006.

(162.83 KB)

Kurzak, J., and J. Dongarra, “Implementing Linear Algebra Routines on Multi-Core Processors with Pipelining and a Look Ahead,” University of Tennessee Computer Science Tech Report, UT-CS-06-581, LAPACK Working Note #178, January 2006.

(304.4 KB)

Luszczek, P., “High Performance Development for High End Computing with Python Language Wrapper (PLW),” International Journal of High Performance Computing Applications (to appear), 00 2006.

(179.32 KB)

Bhowmick, S., V. Eijkhout, Y. Freund, E. Fuentes, and D. Keyes, “Application of Machine Learning to the Selection of Sparse Linear Solvers,” International Journal of High Performance Computing Applications (submitted), 00 2006.

(392.96 KB)

Shende, S., A. D. Malony, A. Morris, and F. Wolf, “Performance Profiling Overhead Compensation for MPI Programs,” In Proc. of the 12th European Parallel Virtual Machine and Message Passing Interface Conference: Springer LNCS, September 2005.

(220.26 KB)

Fagg, G., T. Angskun, G. Bosilca, J. Pjesivac–Grbovic, and J. Dongarra, “Scalable Fault Tolerant MPI: Extending the Recovery Algorithm,” Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI, vol. 3666, Sorrento (Naples) , Italy, Springer-Verlag Berlin, pp. 67, September 2005.

(144.86 KB)

Moore, S., F. Wolf, J. Dongarra, S. Shende, A. D. Malony, and B. Mohr, “A Scalable Approach to MPI Application Performance Analysis,” In Proc. of the 12th European Parallel Virtual Machine and Message Passing Interface Conference: Springer LNCS, September 2005.

(988.58 KB)

Bosilca, G., J. Dongarra, G. Fagg, and J. Langou, “Hash Functions for Datatype Signatures in MPI,” Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI, vol. 3666, Sorrento (Naples), Italy, Springer-Verlag Berlin, pp. 76-83, September 2005.

(304.2 KB)

Wolf, F., A. D. Malony, S. Shende, and A. Morris, “Trace-Based Parallel Performance Overhead Compensation,” In Proc. of the International Conference on High Performance Computing and Communications (HPCC), Sorrento (Naples), Italy, September 2005.

(306.88 KB)

Hermanns, M-A., B. Mohr, and F. Wolf, “Event-based Measurement and Analysis of One-sided Communication,” In Proceedings of the European Conference on Parallel Computing (Euro-Par), Lisbon, Portugal, Springer, August 2005.

(403.44 KB)

Bhatia, N., F. Song, F. Wolf, J. Dongarra, B. Mohr, and S. Moore, “Automatic Experimental Analysis of Communication Patterns in Virtual Topologies,” In Proceedings of the International Conference on Parallel Processing, Oslo, Norway, IEEE Computer Society, June 2005.

(227.13 KB)

Worley, P. H., J. Candy, L. Carrington, K. Huck, T. Kaiser, K. Mahinthakumar, A. D. Malony, S. Moore, D. Reed, P. C. Roth, et al., “Performance Analysis of GYRO: A Tool Evaluation,” In Proceedings of the 2005 SciDAC Conference, San Francisco, CA, June 2005.

(172.07 KB)

Bhatia, N., S. Moore, F. Wolf, J. Dongarra, and B. Mohr, “A Pattern-Based Approach to Automated Application Performance Analysis,” Workshop on Patterns in High Performance Computing, University of Illinois at Urbana-Champaign, May 2005.

(3.47 MB)

Pjesivac–Grbovic, J., T. Angskun, G. Bosilca, G. Fagg, E. Gabriel, and J. Dongarra, “Performance Analysis of MPI Collective Operations,” 4th International Workshop on Performance Modeling, Evaluation, and Optmization of Parallel and Distributed Systems (PMEO-PDS '05), Denver, Colorado, April 2005.

(1018.28 KB)

Luszczek, P., J. Dongarra, D. Koester, R. Rabenseifner, B. Lucas, J. Kepner, J. McCalpin, D. Bailey, and D. Takahashi, Introduction to the HPC Challenge Benchmark Suite , March 2005.

(124.86 KB)

Moore, S., F. Wolf, J. Dongarra, and B. Mohr, “Improving Time to Solution with Automated Performance Analysis,” Second Workshop on Productivity and Performance in High-End Computing (P-PHEC) at 11th International Symposium on High Performance Computer Architecture (HPCA-2005), San Francisco, February 2005.

(112.63 KB)

Tomov, S., J. Langou, A. Canning, L-W. Wang, and J. Dongarra, “Comparison of Nonlinear Conjugate-Gradient methods for computing the Electronic Properties of Nanostructure Architectures,” Proceedings of 5th International Conference on Computational Science (ICCS), Atlanta, GA, USA, Springer's Lecture Notes in Computer Science, pp. 317-325, January 2005.

(172.86 KB)

Chen, Z., and J. Dongarra, “Numerically Stable Real Number Codes Based on Random Matrices,” The International Conference on Computational Science, Atlanta, GA, LNCS 3514, Springer-Verlag, January 2005.

(166.2 KB)

Shimosaka, H., T. Hiroyasu, M. Miki, and J. Dongarra, “Optimization Problem Solving System Using GridRPC,” IEEE Transactions on Parallel and Distributed Systems (submitted), January 2005.

(740.57 KB)

Giraud, L., J. Langou, and G. Sylvand, “On the Parallel Solution of Large Industrial Wave Propagation Problems,” Journal of Computational Acoustics (to appear), January 2005.

(1.08 MB)

Chen, Z., G. Fagg, E. Gabriel, J. Langou, T. Angskun, G. Bosilca, and J. Dongarra, “Fault Tolerant High Performance Computing by a Coding Approach,” Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (to appear), Chicago, Illinois, January 2005.

(209.37 KB)

Chen, Z., and J. Dongarra, “Condition Numbers of Gaussian Random Matrices,” SIAM Journal on Matrix Analysis and Applications (to appear), January 2005.

(186.46 KB)

Pjesivac–Grbovic, J., T. Angskun, G. Bosilca, G. Fagg, E. Gabriel, and J. Dongarra, “Performance Analysis of MPI Collective Operations,” Cluster Computing Journal (to appear), January 2005.

(1018.28 KB)

Tomov, S., J. Langou, A. Canning, L-W. Wang, and J. Dongarra, “Conjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures,” International Journal of Computational Science and Engineering (to appear), January 2005.

(428.21 KB)

Luszczek, P., and D. Koester, “HPC Challenge v1.x Benchmark Suite,” SC|05 Tutorial - S13, Seattle, Washington, January 2005.

(2.94 MB)

Demmel, J., and J. Dongarra, LAPACK 2005 Prospectus: Reliable and Scalable Software for Linear Algebra Computations on High End Computers : LAPACK Working Note 164, January 2005.

(172.59 KB)

Cronk, D., G. Fagg, S. Emeny, and S. Tucker, “Dynamic Process Management for Pipelined Applications,” Proceedings of DoD HPCMP UGC 2005 (to appear), Nashville, TN, IEEE, January 2005.

Dongarra, J., “A Not So Simple Matter of Software,” NCSA Access Online: NCSA, 00 2005.

(457.69 KB)

Wolf, F., B. Mohr, J. Dongarra, and S. Moore, “Automatic analysis of inefficiency patterns in parallel applications,” Concurrency and Computation: Practice and Experience, Special issue "Automatic Performance Analysis" (submitted), 00 2005.

(233.31 KB)

Wolf, F., B. Mohr, J. Dongarra, and S. Moore, “Efficient Pattern Search in Large Traces through Successive Refinement,” Proceedings of Euro-Par 2004, Pisa, Italy, Springer-Verlag, August 2004.

(177.46 KB)

Song, F., F. Wolf, N. Bhatia, J. Dongarra, and S. Moore, “An Algebra for Cross-Experiment Performance Analysis,” 2004 International Conference on Parallel Processing (ICCP-04), Montreal, Quebec, Canada, August 2004.

(166.12 KB)

Fagg, G., E. Gabriel, G. Bosilca, T. Angskun, Z. Chen, J. Pjesivac–Grbovic, K. London, and J. Dongarra, “Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems,” Proceedings of ISC2004 (to appear), Heidelberg, Germany, June 2004.

(548.38 KB)

Luszczek, P., and J. Dongarra, “Design of an Interactive Environment for Numerically Intensive Parallel Linear Algebra Calculations,” International Conference on Computational Science, Poland, Springer Verlag, June 2004. DOI: 10.1007/978-3-540-25944-2_35

(88.31 KB)

Fagg, G., E. Gabriel, Z. Chen, T. Angskun, G. Bosilca, J. Pjesivac–Grbovic, and J. Dongarra, “Process Fault-Tolerance: Semantics, Design and Applications for High Performance Computing,” International Journal for High Performance Applications and Supercomputing (to appear), April 2004.

(186.9 KB)

Chen, Z., J. Dongarra, P. Luszczek, and K. Roche, “LAPACK for Clusters Project: An Example of Self Adapting Numerical Software,” Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS 04'), vol. 9, Big Island, Hawaii, pp. 90282, January 2004.

(80.97 KB)

Agarwal, P., R. A. Alexander, E. Apra, S. Balay, A. S. Bland, J. Colgan, E. D'Azevedo, J. Dongarra, T. Dunigan, M. Fahey, et al., “Cray X1 Evaluation Status Report,” Oak Ridge National Laboratory Report, vol. /-2004/13, January 2004.

(817.33 KB)

Mucci, P., “Memory Bandwidth and the Performance of Scientific Applications: A Study of the AMD Opteron Processor,” 2005 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (submitted), January 2004.

(210.29 KB)

Beck, M., J. Dongarra, J. Huang, T. Moore, and J. Plank, “Active Logistical State Management in the GridSolve/L,” 4th International Symposium on Cluster Computing and the Grid (CCGrid 2004)(submitted), Chicago, Illinois, January 2004.

(123.69 KB)

Moore, K., “Recommendations for Automatic Responses to Electronic Mail,” RFC 3834: Internet Engineering Task Force (IETF), January 2004.

(174.76 KB)

Dongarra, J., and A. Lastovetsky, “An Overview of Heterogeneous High Performance and Grid Computing,” Engineering the Grid (to appear): Nova Science Publishers, Inc., 00 2004.

(199.93 KB)

Demmel, J., J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, C. Whaley, and K. Yelick, “Self Adapting Linear Algebra Algorithms and Software,” IEEE Proceedings (to appear), 00 2004.

(587.67 KB)

Fagg, G., and J. Dongarra, “Building and using a Fault Tolerant MPI implementation,” International Journal of High Performance Applications and Supercomputing (to appear), 00 2004.

Eidson, T., V. Eijkhout, and J. Dongarra, “Improvements in the Efficient Composition of Applications,” IPDPS 2004, NGS Workshop (to appear), Sante Fe, 00 2004.

(42.85 KB)

Wolf, F., and B. Mohr, “Automatic performance analysis of hybrid MPI/OpenMP applications,” Journal of Systems Architecture, Special Issue 'Evolutions in parallel distributed and network-based processing', vol. 49(10-11): Elsevier, pp. 421-439, November 2003.

Fagg, G., E. Gabriel, Z. Chen, T. Angskun, G. Bosilca, A. Bukovsky, and J. Dongarra, “Fault Tolerant Communication Library and Applications for High Performance Computing,” Los Alamos Computer Science Institute (LACSI) Symposium 2003 (presented), Santa Fe, NM, October 2003.

(146.05 KB)

Gabriel, E., G. Fagg, and J. Dongarra, “Evaluating The Performance Of MPI-2 Dynamic Communicators And One-Sided Communication,” Lecture Notes in Computer Science, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 10th European PVM/MPI User's Group Meeting, vol. 2840, Venice, Italy, Springer-Verlag, Berlin, pp. 88-97, September 2003.

(254.08 KB)

Mohr, B., and F. Wolf, “KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Applications,” Proc. of the European Conference on Parallel Computing (EuroPar), vol. 2790, Klagenfurt, Austria, Springer-Verlag, pp. 1301-1304, August 2003.

(196.05 KB)

Dongarra, J., and V. Eijkhout, “Self-Adapting Numerical Software and Automatic Tuning of Heuristics,” Lecture Notes in Computer Science, vol. 2660, Melbourne, Australia, Springer Verlag, pp. 759-770, June 2003.

(45.95 KB)

Sloot, P. M., D. Abramson, A. V. Bogdanov, J. Dongarra, A. Zomaya, and Y. Gorbachev, “Computational Science — ICCS 2003,” Lecture Notes in Computer Science, vol. 2657-2660, ICCS 2003, International Conference. Melbourne, Australia, Springer-Verlag, Berlin, June 2003.

Gabriel, E., G. Fagg, A. Bukovsky, T. Angskun, and J. Dongarra, “A Fault-Tolerant Communication Library for Grid Environments,” 17th Annual ACM International Conference on Supercomputing (ICS'03) International Workshop on Grid Computing and e-Science, San Francisco, June 2003.

(377.14 KB)

Vadhiyar, S., “A Performance Oriented Migration Framework for the Grid,” Proceedings of the 3rd International Symposium on Cluster Computing and the Grid, Tokyo, Japan, pp. 130-137, May 2003.

(113.6 KB)

Eidson, T., J. Dongarra, and V. Eijkhout, “Applying Aspect-Oriented Programming Concepts to a Component-based Programming Model,” IPDPS 2003, Workshop on NSF-Next Generation Software, Nice, France, March 2003.

(66.99 KB)

Vadhiyar, S., and J. Dongarra, “GrADSolve - A Grid-based RPC System for Remote Invocation of Parallel Software,” Journal of Parallel and Distributed Computing (submitted), March 2003.

(241.3 KB)

Dail, H., O. Sievert, F. Berman, H. Casanova, A. YarKhan, S. Vadhiyar, J. Dongarra, C. Liu, L. Yang, D. Angulo, et al., “Scheduling in the Grid Application Development Software Project,” Resource Management in the Grid: Kluwer Publishers, March 2003.

(375.92 KB)

Vadhiyar, S., and J. Dongarra, “Self Adaptability in Grid Computing,” Concurrency: Practice and Experience (submitted), March 2003.

(258.89 KB)

Hiroyasu, T., M. Miki, K. Kodama, J. Uekawa, and J. Dongarra, “A Simple Installation and Administration Tool for Large-scaled PC Cluster System,” ClusterWorld Conference and Expo, San Jose, CA, March 2003.

(275.97 KB)

Hiroyasu, T., M. Miki, S. Ogura, K. Aoi, T. Yoshida, Y. Okamoto, and J. Dongarra, “Energy Minimization of Protein Tertiary Structure by Parallel Simulated Annealing using Genetic Crossover,” Special Issue on Biological Applications of Genetic and Evolutionary Computation (submitted), March 2003.

(438.68 KB)

Beck, M., J. Dongarra, V. Eijkhout, M. Langston, T. Moore, and J. Plank, “Scalable, Trustworthy Network Computing Using Untrusted Intermediaries: A Position Paper,” DOE/NSF Workshop on New Directions in Cyber-Security in Large-Scale Networks: Development Obstacles, National Conference Center - Landsdowne, Virginia, March 2003.

(54.62 KB)

Hiroyasu, T., M. Miki, H. Shimosaka, and J. Dongarra, “Optimization Problem Solving System using Grid RPC,” 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, Tokyo, Japan, March 2003.

(71.6 KB)

Dongarra, J., D. Laforenza, and S. Orlando, “Recent Advances in Parallel Virtual Machine and Message Passing Interface,” Lecture Notes in Computer Science, vol. 2840: Springer-Verlag, Berlin, January 2003.

Hiroyasu, T., M. Miki, M. Sano, H. Shimosaka, S. Tsutsui, and J. Dongarra, “Distributed Probablistic Model-Building Genetic Algorithm,” Lecture Notes in Computer Science, vol. 2723: Springer-Verlag, Heidelberg, pp. 1015-1028, January 2003.

(288.91 KB)

Dongarra, J., “High Performance Computing Trends and Self Adapting Numerial Software,” Lecture Notes in Computer Science, High Performance Computing, 5th International Symposium ISHPC, vol. 2858, Tokyo-Odaiba, Japan, Springer-Verlag, Heidelberg, pp. 1-9, January 2003.

Plank, J., M. Beck, J. Dongarra, R. Wolski, and H. Casanova, “Optimizing Performance and Reliability in Distributed Computing Systems Through Wide Spectrum Storage,” Proceedings of the IPDPS 2003, NGS Workshop, Nice, France, pp. 209, January 2003.

Palma, J., J. Dongarra, and V. Hernández, “High Performance Computing for Computational Science,” Lecture Notes in Computer Science, vol. 2565, VECPAR 2002, 5th International Conference June 26-28, 2002, Springer-Verlag, Berlin, January 2003.

Vadhiyar, S., J. Dongarra, and A. YarKhan, “GrADSolve - RPC for High Performance Computing on the Grid,” Lecture Notes in Computer Science, Proceedings of the 9th International Euro-Par Conference, vol. 2790, Klagenfurt, Austria, Springer-Verlag, Berlin, pp. 394-403, January 2003. DOI: 10.1007/978-3-540-45209-6_58

(125.96 KB)

, “The Future of Supercomputing: An Interim Report,” National Research Council, Washington, D.C., The National Academies Press, January 2003.

Lee, DW., and J. Dongarra, “VisPerf: Monitoring Tool for Grid Computing,” Lecture Notes in Computer Science, vol. 2659: Springer Verlag, Heidelberg, pp. 233-243, 00 2003.

(835.09 KB)

Heinrich, K., M. Berry, J. Dongarra, and S. Vadhiyar, “The Semantic Conference Organizer,” Statistical Data Mining and Knowledge Discovery: CRC Press, 00 2003.

(998.12 KB)

Agrawal, S., J. Dongarra, K. Seymour, and S. Vadhiyar, “NetSolve: Past, Present, and Future - A Look at a Grid Enabled Server,” Making the Global Infrastructure a Reality: Wiley Publishing, 00 2003.

(158.19 KB)

YarKhan, A., and J. Dongarra, “Experiments with Scheduling Using Simulated Annealing in a Grid Environment,” Grid Computing - GRID 2002, Third International Workshop, vol. 2536, Baltimore, MD, Springer, pp. 232-242, November 2002.

(66.91 KB)

Hiroyasu, T., M. Miki, H. Shimosaka, Y. Tanimura, and J. Dongarra, “Optimization System Using Grid RPC,” Meeting of the Japan Society of Mechanical Engineers, Kyoto University, Kyoto, Japan, October 2002.

Casanova, H., T. Bartol, F. Berman, A. Birnbaum, J. Dongarra, M. Ellisman, M. Faerman, E. Gockay, M. Miller, G. Obertelli, et al., “The Virtual Instrument: Support for Grid-enabled Scientific Simulations,” Journal of Parallel and Distributed Computing (submitted), October 2002.

(282.16 KB)

Bassi, A., M. Beck, G. Fagg, T. Moore, J. Plank, M. Swany, and R. Wolski, “The Internet BackPlane Protocol: A Study in Resource Sharing,” Proceedings of the second IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID 2002), Berlin, Germany, October 2002.

Hiroyasu, T., M. Miki, H. Shimosaka, M. Sano, Y. Tanimura, Y. Mimura, S. Yoshimura, and J. Dongarra, “Truss Structural Optimization Using NetSolve System,” Meeting of the Japan Society of Mechanical Engineers, Kyoto University, Kyoto, Japan, October 2002.

(450.65 KB)

Arbenz, P., A. Cleary, J. Dongarra, and M. Hegland, “A Comparison of Parallel Solvers for General Narrow Banded Linear Systems,” Parallel and Distributed Computing Practices, vol. 2, pp. 385-400, October 2002.

(304.96 KB)

Cuenca, J., D. Giminez, J. González, J. Dongarra, and K. Roche, “Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load,” EuroPar 2002, Paderborn, Germany, August 2002.

(92.59 KB)

Vadhiyar, S., and J. Dongarra, “A Metascheduler For The Grid,” Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC 2002), Edinburgh, Scotland, IEEE Computer Society, pp. 343-351, July 2002.

(99.53 KB)

Kennedy, K., J. Mellor-Crummey, K. Cooper, L. Torczon, F. Berman, A. Chien, D. Angulo, I. Foster, D. Gannon, L. Johnsson, et al., “Toward a Framework for Preparing and Executing Adaptive Grid Programs,” International Parallel and Distributed Processing Symposium: IPDPS 2002 Workshops, Fort Lauderdale, FL, pp. 0171, April 2002.

(64.5 KB)

Seymour, K., H. Nakada, S. Matsuoka, J. Dongarra, C. Lee, and H. Casanova, “Overview of GridRPC: A Remote Procedure Call API for Grid Computing,” Proceedings of the Third International Workshop on Grid Computing, pp. 274-278, January 2002.

(221.82 KB)

Roche, K., and J. Dongarra, “Deploying Parallel Numerical Library Routines to Cluster Computing in a Self Adapting Fashion,” Parallel Computing: Advances and Current Issues:Proceedings of the International Conference ParCo2001, London, England, Imperial College Press, January 2002.

(381.89 KB)

Dongarra, J., V. Eijkhout, and H. van der Vorst, “An Iterative Solver Benchmark,” Scientific Programming (to appear), 00 2002.

(142.67 KB)

Dongarra, J., H. Meuer, H. D. Simon, and E. Strohmaier, “High Performance Computing Trends,” HERMIS, vol. 2, pp. 155-163, November 2001.

Beck, M., D. Arnold, A. Bassi, F. Berman, H. Casanova, J. Dongarra, T. Moore, G. Obertelli, J. Plank, M. Swany, et al., “Logistical Computing and Internetworking: Middleware for the Use of Storage in Communication,” submitted to SC2001, Denver, Colorado, November 2001.

(41.79 KB)

Vadhiyar, S., G. Fagg, and J. Dongarra, “Performance Modeling for Self Adapting Collective Communications for MPI,” LACSI Symposium 2001, Santa Fe, NM, October 2001.

(105.49 KB)

Fagg, G., E. Gabriel, and M. Resch, “Parallel IO Support for Meta-Computing Applications: MPI_Connect IO Applied to PACX-MPI,” 8th European PVM/MPI User's Group Meeting, Lecture Notes in Computer Science, vol. 2131, Greece, Springer Verlag, Berlin, September 2001.

(129.3 KB)

Moore, S., D. Arnold, and D. Cronk, “Metacomputing Support for the SARA3D Structural Acoustics Application,” Department of Defense Users' Group Conference (to appear), Biloxi, Mississippi, June 2001.

(64.58 KB)

Seymour, K., and J. Dongarra, “Automatic Translation of Fortran to JVM Bytecode,” Joint ACM Java Grande - ISCOPE 2001 Conference (submitted), Stanford University, California, June 2001.

(185.8 KB)

Cronk, D., G. Fagg, and S. Moore, “Parallel I/O for EQM Applications,” Department of Defense Users' Group Conference Proceedings (to appear),, Biloxi, Mississippi, June 2001.

(81.41 KB)

Beck, M., T. Moore, L. Abrahamsson, C. Achouiantz, and P. Johansson, “Enabling Full Service Surrogates Using the Portable Channel Representation,” Tenth International World Wide Web Conference Proceedings (to appear),, Hong Kong, May 2001.

(267.23 KB)

Miller, M., C. Moulding, J. Dongarra, and C. Johnson, “Grid-Enabling Problem Solving Environments: A Case Study of SCIRUN and NetSolve,” Proceedings of the High Performance Computing Symposium (HPC 2001) in 2001 Advanced Simulation Technologies Conference, Seattle, Washington, Society for Modeling and Simulation International, April 2001.

(144.19 KB)

Casanova, H., S. Matsuoka, and J. Dongarra, “Network-Enabled Server Systems: Deploying Scientific Simulations on the Grid,” 2001 High Performance Computing Symposium (HPC'01), part of the Advance Simulation Technologies Conference, Seattle, Washington, April 2001.

(175.23 KB)

Blackford, S., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, et al., “Basic Linear Algebra Subprograms (BLAS),” (an update), submitted to ACM TOMS, February 2001.

(228.33 KB)

Steen, A J. van der, and J. Dongarra, “Overview of High Performance Computers,” Handbook of Massive Data Sets: Kluwer Academic Publishers, pp. 791-852, January 2001.

(442.71 KB)

Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard , January 2001.

Fagg, G., A. Bukovsky, and J. Dongarra, “Fault Tolerant MPI for the HARNESS Meta-Computing System,” Proceedings of International Conference of Computational Science - ICCS 2001, Lecture Notes in Computer Science, vol. 2073, Berlin, Springer Verlag, pp. 355-366, 00 2001. DOI: 10.1007/3-540-45545-0_44

Browne, S., J. Dongarra, N. Garner, K. London, and P. Mucci, “A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters,” Proceedings of SuperComputing 2000 (SC'00), Dallas, TX, November 2000.

(178.15 KB)

Vadhiyar, S., G. Fagg, and J. Dongarra, “Automatically Tuned Collective Communications,” Proceedings of SuperComputing 2000 (SC'2000), Dallas, TX, November 2000.

(232.69 KB)

Arnold, D., and J. Dongarra, “Developing an Architecture to Support the Implementation and Development of Scientific Computing Applications,” to appear in Proceedings of Working Conference 8: Software Architecture for Scientific Computing Applications, Ottawa, Canada, October 2000.

(176.25 KB)

Arnold, D., and J. Dongarra, “The NetSolve Environment: Progressing Towards the Seamless Grid,” 2000 International Conference on Parallel Processing (ICPP-2000), Toronto, Canada, August 2000.

(148.85 KB)

Arnold, D., S. Blackford, J. Dongarra, V. Eijkhout, and T. Xu, “Seamless Access to Adaptive Solver Algorithms,” Proceedings of 16th IMACS World Congress 2000 on Scientific Computing, Applications Mathematics and Simulation, Lausanne, Switzerland, August 2000.

(151.42 KB)

Dongarra, J., and P. Raghavan, “A New Recursive Implementation of Sparse Cholesky Factorization,” Proceedings of 16th IMACS World Congress 2000 on Scientific Computing, Applications Mathematics and Simulation, Lausanne, Switzerland, August 2000.

Arnold, D., S. Browne, J. Dongarra, G. Fagg, and K. Moore, “Secure Remote Access to Numerical Software and Computational Hardware,” Proceedings of the DoD HPC Users Group Conference (HPCUG) 2000, Albuquerque, NM, June 2000.

(172.6 KB)

Arnold, D., W. Lee, J. Dongarra, and M. Wheeler, “Providing Infrastructure and Interface to High Performance Applications in a Distributed Setting,” ASTC-HPC 2000, Washington, DC, April 2000.

(96.04 KB)

Fagg, G., and J. Dongarra, “FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World,” Lecture Notes in Computer Science: Proceedings of EuroPVM-MPI 2000, (Hungary: Springer Verlag, 2000), pp. V1908,346-353, January 2000.

(51.95 KB)

Arnold, D., D. Bachmann, and J. Dongarra, “Request Sequencing: Optimizing Communication for the Grid,” Lecture Notes in Computer Science: Proceedings of 6th International Euro-Par Conference 2000, Parallel Processing, (Germany: Springer Verlag 2000), pp. V1900,1213-1222, January 2000.

(165.92 KB)

Dongarra, J., H. Meuer, H. D. Simon, and E. Strohmaier, “High Performance Computing Today,” FOMMS 2000: Foundations of Molecular Modeling and Simulation Conference (to appear), January 2000.

(66 KB)

Dongarra, J., V. Eijkhout, and P. Luszczek, “Recursive approach in sparse matrix LU factorization,” Proceedings of 1st SGI Users Conference, Cracow, Poland (ACC Cyfronet UMM, 2000), pp. 409-418, January 2000.

(176.14 KB)

Beck, M., T. Moore, J. Plank, and M. Swany, “Logistical Networking: Sharing More Than the Wires,” In Active Middleware Services, Ed. Salim Hariri, Craig A. Lee, Cauligi S. Raghavendra (2000), Kluwer Academic, Norwell, MA, January 2000.

(84.69 KB)

Dongarra, J., P. Kacsuk, and N. Podhorszki, “Recent Advances in Parallel Virtual Machine and Message Passing Interface,” Lecture Notes in Computer Science: Proceedings of 7th European PVM/MPI Users' Group Meeting 2000, (Hungary: Springer Verlag), pp. V1908, January 2000.

Dongarra, J., G. Fagg, R. Hempel, and D. W. Walker, “Message Passing Software Systems,” Encyclopedia of Electrical and Engineering, Supplement 1: John Wiley & Sons, Inc., 00 2000.

(289.38 KB)

Dongarra, J., and V. Eijkhout, “Numerical Linear Algebra,” Encyclopedia of Computer Science and Technology, eds. Kent, A., Williams, J., vol. 41, pp. 207-233, August 1999.

(262 KB)

Browne, S., C. Deane, G. Ho, and P. Mucci, “PAPI: A Portable Interface to Hardware Performance Counters,” Proceedings of Department of Defense HPCMP Users Group Conference, June 1999.

(57.77 KB)

Petitet, A., H. Casanova, C. Whaley, J. Dongarra, and Y. Robert, “A Numerical Linear Algebra Problem Solving Environment Designer's Perspective (LAPACK Working Note 139),” SIAM Annual Meeting, Atlanta, GA, May 1999.

(319.71 KB)

Beck, M., R. Chawla, B. Dempsey, and T. Moore, “Portable Representation of Internet Content Channels in I2-DSI,” 4th Intl. Web Caching Workshop, San Diego, CA, March 1999.

Anderson, E., Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, et al., “LAPACK Users' Guide, 3rd ed.,” Philadelphia: Society for Industrial and Applied Mathematics, January 1999.

Browne, S., J. Dongarra, and A. Trefethen, “Numerical Libraries and Tools for Scalable Parallel Cluster Computing,” IEEE Cluster Computing BOF at SC99, Portland, Oregon, January 1999.

(37.38 KB)

Petitet, A., H. Casanova, J. Dongarra, Y. Robert, and C. Whaley, “Parallel and Distributed Scientific Computing: A Numerical Linear Algebra Problem Solving Environment Designer's Perspective,” Handbook on Parallel and Distributed Processing, January 1999.

(323.01 KB)

Whaley, C., and J. Dongarra, “Automatically Tuned Linear Algebra Software,” 1998 ACM/IEEE conference on Supercomputing (SC '98), Orlando, FL, IEEE Computer Society, November 1998.

Browne, S., J. Dongarra, J. Horner, P. McMahan, and S. Wells, “National HPCC Software Exchange (NHSE): Uniting the High Performance Computing and Communications Community,” D-Lib Magazine, January 1998.

(56.15 KB)

Tomov, S., R. Nath, H. Ltaeif, and J. Dongarra, “Dense Linear Algebra Solvers for Multicore with GPU Accelerators,” Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, Atlanta, GA, pp. 1-8, 2010. DOI: 10.1109/IPDPSW.2010.5470941

(1 MB)

Fürlinger, K., M. Gerndt, and J. Dongarra, “Scalability Analysis of the SPEC OpenMP Benchmarks on Large-Scale Shared Memory Multiprocessors,” Proceedings of the 2007 International Conference on Computational Science (ICCS 2007), vol. 4487-4490, Beijing, China, Springer LNCS, pp. 815-822, 2007. DOI: 10.1007/978-3-540-72586-2_115

(145.84 KB)

Bosilca, G., C. Coti, T. Herault, P. Lemariner, and J. Dongarra, “Constructing Resiliant Communication Infrastructure for Runtime Environments in Advances in Parallel Computing,” Advances in Parallel Computing - Parallel Computing: From Multicores and GPU's to Petascale, vol. 19, pp. 441-451, 2010. DOI: 10.3233/978-1-60750-530-3-441