|
2016/09/30 |
Azzam Haidar |
|
A note on the Power and Performance Analysis of Dense Linear Algebra on Intel Xeon Phi Processors |
|
|
2016/09/22 |
Phil Vaccaro |
|
PAPI Component: Powercap |
philip-vaccaro-slides-09-22-2016.pdf |
|
2016/09/16 |
Dmitry Lyakh |
ORNL |
Dense/sparse numeric tensor algebra: Scalable, hardware-agnostic design for performance portability |
dmitry-lyakh-slides-09-16-2016.pdf |
|
2016/09/09 |
Yves Robert |
INRIA |
Computing the expected longest path of task graphs in the presence of silent errors |
yves-robert-slides-09-09-2016.pdf |
|
2016/08/05 |
Joe Dorris |
ICL |
Patent Data Visualization and Processing |
|
|
2016/07/22 |
Myungho Lee |
Soongsil University |
Memory-Efficient Parallelization of 3D Lattice Boltzmann Flow Solver on a GPU |
|
|
2016/07/01 |
Emmanuel Jeannot |
INRIA |
Topology-Aware Data Management |
Jeannot-Topology-Aware-Data-Management-07-01-16.pdf |
|
2016/06/17 |
Julien Langou |
University of Colorado |
A Makespan Lower Bound for the Scheduling of the Tiled Cholesky Factorization based on ALAP scheduling |
Langou-A-Makespan-Lower-Bound-for-the-Scheduling-of-the-Tiled-Cholesky-Factorization-based-on-ALAP-scheduling-06-17-16.pdf |
|
2016/06/10 |
Emmanuel Agullo |
INRIA |
Overview of Task-based Sparse and Data-sparse Solvers on Top of Runtime Systems |
|
|
2016/05/27 |
Azzam Haidar |
ICL |
Heterogeneous Computation: The Current Challenge |
|
|
2016/05/20 |
Iain Duff |
the Numerical Analysis Group at the Scientific Computing Department of the Science and Technology Facilities Council (UK) |
Scalability of Sparse Direct Codes |
Duff-Scalability-of-Sparse-Direct-Codes-05-20-16.pdf |
|
2016/05/13 |
George Bosilca |
ICL |
PaRSEC - Yet another runtime? |
Bosilca-PaRSEC-Yet-Another-Runtime-05-13-16.pdf |
|
2016/05/06 |
Oleg Shylo |
Department of Industrial & Systems Engineering at UTK |
Scalable Communication for Parallel Optimization |
|
|
2016/05/04 |
Yaohung Tsai |
ICL |
AlphaGo: The Go AI from Google DeepMind |
|
|
2016/04/29 |
Wei Wu |
ICL |
Accelerator Integration with Programming Models |
Wu-Accelerator-Integration-with-Programming-Models-04-29-16.pdf |
|
2016/04/22 |
Chongxiao Cao |
ICL |
Fault Tolerant Design for a Task-based Runtime |
Cao-Fault-Tolerant-Design-for-a-Task-based-Runtime-04-22-16.pdf |
|
2016/04/15 |
Miro Stoyanov |
ORNL |
Resilient Solvers for Partial Differential Equations |
Stoyanov-Resilient-Solvers-for-Partial-Differential-Equations-04-15-16.pdf |
|
2016/04/08 |
Piotr Luszczek |
ICL |
Search Space Description, Generation, and Pruning System for Autotuners |
Luszczek-Programming-Autotuners-with-BEAST-Search-Space-Description-Generation-and-Pruning-System-for-Autotuners-04-08-16.pdf |
|
2016/04/01 |
Ahmad Ahmad |
ICL |
On the Development of Variable-Size Batched Computation for Heterogeneous Parallel Architectures |
Ahmad-On-the-Development-of-Variable-Size-Batched-Computation-for-Heterogeneous-Parallel-Architectures-01-04-2016.pdf |
|
2016/03/24 |
Phil Mucci |
Minimal Metrics |
Systems Performance @ Sandia |
Mucci-Minimal-Metrics-Systems-Performance@Sandia-03-24-16.pdf |
|
2016/03/18 |
Tim Davis |
Texas A&M University |
Sparse Matrix Algorithms: Combinatorics + Numerical Methods + Applications |
Davis-Sparse-Matrix-Algorithms-03-18-2016.pdf |
|
2016/03/18 |
Sanjay Ranka |
University of Florida |
A Genetic Algorithm Based Approach for Multi-objective Hardware/Software Co-optimization |
Ranka-A-Genetic-Algorithm-Based-Approach-for-Multi-objective-Hardware_Software-Co-optimization-03-18-2016.pdf |
|
2016/03/11 |
Ichitaro Yamazaki |
ICL |
Preconditioning a Communication-avoiding Krylov solver |
Yamazaki-Preconditioning-Communication-Avoiding-Krylov-Methods-03-21-2016.pdf |
|
2016/03/04 |
Hartwig Anzt |
ICL |
Solving Sparse Linear Systems on GPUs - The Good, the Bad, and the Ugly |
|
|
2016/02/26 |
Peter Liaw |
UTK Department of Materials Science and Engineering |
|
|
|
2016/02/19 |
Thomas Herault |
ICL |
Practical Scalable Consensus for Pseudo Synchronous Distributed Systems |
Herault-Practical-Scalable-Consensus-for-Pseudo-Synchronous-Distributed-Systems-02-19-2016.pdf |
|
2016/02/12 |
Hartwig Anzt |
ICL |
A New Parallel Threshold ILU |
|
|
2016/02/05 |
Mathieu Faverge |
INRIA |
Massively Parallel Cartesian Discrete
Ordinates Method for Neutron Transport Simulation |
Faverge-Massively-Parallel-Cartesian-Discrete-Ordinates-Method-for-Neutron-Transport-Simulation-02-05-2016.pdf |
|
2016/01/29 |
Joe Dorris |
ICL |
PLASMA OpenMP on Xeon Phi and A Case Study with Cholesky Decomposition |
Dorris-PLASMA-OpenMP-on-Xeon-Phi-and-A-Case-Study-with-Cholesky-Decomposition-01-29-16.pdf |
|
2016/01/22 |
Aurelien Bouteiller |
ICL |
Plan B: Interruption of Ongoing MPI Operations to Support Failure Recovery |
Bouteiller-Revoke-Plan-B-01-22-2016.pdf |
|
2016/01/14 |
David Keffer |
UTK Department of Materials Science and Engineering |
Algorithms for 3D-3D Registration with Known and Unknown References: Applications to Materials Science |
Keffer-Algorithms-for-3D-3D-Registration-with-Known-and-Unknown-References-01-14-2016.pdf |
|
2016/01/08 |
Yves Robert |
INRIA |
Which Verification for Silent Error Detection? |
Robert-Which-verification-for-soft-error-detection-01-08-2016.pdf |
|
2015/12/11 |
Kalyan Perumalla |
ORNL |
|
|
|
2015/12/04 |
Azzam Haidar |
ICL |
Batched Matrix Computations on Hardware Accelerators |
Amhad-GPU-Accelerated-Memory-bound-Linear-Algebra-Kernels-2015-04-17.pdf |
|
2015/11/13 |
Sticks Mabakane |
University of Cape Town |
Novel Visualizations for Optimization of Parallel Programs |
Mabakane-Novel-visualizations-for-optimization-of-parallel-programs-11-13-2015.pdf |
|
2015/11/06 |
Moritz Kreutzer |
Friedrich-Alexander University Erlangen-Nürnberg |
Performance Engineering of the Kernel Polynomal Method on Large-Scale CPU-GPU Systems |
Kreutzer-Performance-Engineering-of-the-Kernel-Polynomial-Method-on-Large-Scale-CPU-GPU-Systems-11-06-2015.pdf |
|
2015/11/03 |
Takeshi Fukaya |
Hokkaido University |
CholeskyQR2: Cholesky QR factorization with reorthogonalization |
Fukaya-CholeskyQR2-Cholesky-QR-factorization-with-reorthogonalization-11-03-2015.pdf |
|
2015/11/03 |
Toshiyuki Imamura |
RIKEN AICS |
ASPEN.K2+MUBLAS:level2 CUDA BLAS kernels |
Imamura-ASPEN_K2_MUBLAS_level2-CUDA-BLAS-kernels-11-03-2015.pdf |
|
2015/10/30 |
Michael Barton |
United States Army Research Laboratory (ARL) |
Data Intensive Science and Computing |
Barton-Data-Intensive-Science-and-Computing-10-30-2015.pdf |
|
2015/10/23 |
Bob Muenchen |
UTK |
Monitoring Trends in Tools for Data Science |
Muenchen-Monitoring-Trends-in-Tools-for-Data-Science-10-23-2015.pdf |
|
2015/10/16 |
Pierre Sens |
LIP6 |
Probabilistic Byzantine Tolerance for Cloud Computing |
Sens-Probabilistic-Byzantine-Tolerance-for-Cloud-Computing-10-16-2015.pdf |
|
2015/10/12 |
Edmond Chow |
Georgia Tech |
Very Fine-grained Parallelization of Sparse Linear Algebra Computations |
Chow-Very-Fine-Grained-Parallelization-of-Approximate-Sparse-Matrix-Computations-10-12-2015.pdf |
|
2015/10/09 |
Mike Jantz |
EECS |
Cross-Layer Memory Management to Achieve Power and Performance Goals |
Jantz-Cross-Layer-Memory-Management-to-Achieve-Power-and-Performance-Goals-10-09-2015.pdf |
|
2015/10/02 |
Mike Guidry |
ORNL |
Fast New Methods for Solving Large Sets of Coupled Differential Equations at
Scale in Scientific Applications |
Guidry-Fast-New-Methods-for-Solving-Large-Sets-of-Coupled-Differential-Equations-at-Scale-in-Scientific-Applications-10-02-2015.pdf |
|
2015/09/25 |
Ichitaro Yamazaki |
ICL |
Random Sampling to Update Truncated SVD |
Yamazaki-Random-Sampling-to-Update-Partial-SVD-9-25-2015.pdf |
|
2015/09/18 |
Mark Gates |
ICL |
Accelerating Collaborative Filtering Using Concepts from High Performance Computing |
Gates-Accelerating-collaborative-filtering-using-HPC-concepts-09-18-2015.pdf |
|
2015/09/11 |
Asim YarKhan |
ICL |
OpenMP Tasks and PLASMA |
YarKhan-OpenMP-Tasks-and-PLASMA-09-11-15.pdf |
|
2015/09/04 |
Mathieu Faverge |
Inria |
Blocking Strategy Optimizations for Sparse Direct Linear Solver on Heterogeneous Architectures |
Faverge-Blocking-Strategy-Optimizations-for-Sparse-Direct-Linear-Solver-on-Heterogeneous-Architectures-09-04-2015.pdf |
|
2015/08/28 |
Tingxing Dong |
ICL |
Batched Linear Algebra Problems on Hardware Accelerators Based on GPUs |
Dong-Batched-Linear-Algebra-Problems-on-Hardware-Accelerators-Based-on-GPUs-08-28-2015.pdf |
|
2015/08/21 |
Yaohung Tsai |
ICL |
Convolutional Layers in RaPyDLI |
Tsai-Convolutional-Layers-in-RaPyDLI-08-21-2015.pdf |
|
2015/08/07 |
Ian Masliah |
University of Paris-Sud |
Towards C++ and Beyond |
Masliah-Towards-C++-and-Beyond-08-07-2015.pdf |
|
2015/07/31 |
Joseph Schuchart |
TU Dresden |
HPC energy-efficiency research at ZIH, Or: What the HAEC is HDEEM? |
Schuchart-Energy-Efficiency-Research-at-ZIH-07-31-2015.pdf |
|
2015/07/17 |
Sangamesh Ragate |
ICL |
PC Sampling in GPU |
Ragate-PC-Sampling-in-GPU-2015-17-07.pdf |
|
2015/07/01 |
Ed Valeev |
Virginia Tech |
Tensor Computation for Chemistry Sparsity and More |
Valeev-Tensor-Computation-for-Chemistry-Sparsity-and-More-2015-07-01.pdf |
|
2015/07/01 |
Torsten Hoefler |
ETH Zürich |
Towards Fully Automated Interpretable Performance Models |
Hoefler-Towards-Fully-Automated-Interpretable-Performance-Models-2015-07-01.pdf |
|
2015/06/26 |
Reazul Hoque |
ICL |
Dynamic Task Discovery in PaRSEC |
Hoque-Dynamic-Task-Discovery-in-PaRSEC-2015-06-26.pdf |
|
2015/06/12 |
Damien Genet |
ICL |
Design of Generic Modular Solutions for PDE Solvers for Modern Architectures |
Ganet-Design-of-Generic-Modular-Solutions-for-PDE-Solvers-for-Modern-Architectures-2015-06-12.pdf |
|
2015/06/05 |
Nageswara Rao |
ORNL |
Fault Diagnosis of Hybrid CPU-GPU Computing Systems Using Chaotic Maps |
Rao-Chaotic-Map-Method-for-Detection-and-Diagnosis-of-CPU-GPU-Hybrid-Computing-Systems-2015-06-05.pdf |
|
2015/05/29 |
Chad Steed |
ORNL |
Extreme Scale Visual Data Science |
Steed-Visual-Data-Science-2015-05-29.pdf |
|
2015/05/15 |
Eduardo Ponce |
EECS |
IDR(s)-Biortho: A Case Study of MAGMA Sparse Iterative Solvers |
Ponce-IDR-Solver-for-MAGMA Sparse-Iter-Package-2015-05-15.pdf |
|
2015/05/08 |
Chunyan Tang |
ICL |
From MPI to OpenSHMEM: Porting LAMMPS |
Tang-From-MPI-to-openSHMEM-porting-LAMMPS-2015-05-08.pdf |
|
2015/05/01 |
Wei Wu |
ICL |
Hierarchical DAG Scheduling for Hybrid Distributed Systems |
Wu-Hierarchical-DAG-scheduling-for-Hybrid-Distributed-Systems-2015-05-01.pdf |
|
2015/04/24 |
Manish Parashar |
Rutgers |
Big Data Challenges in Simulation-based Science |
Parashar-Big-Data-Challenges-in-Simulation-based-Science-2015-04-15.pdf |
|
2015/04/17 |
Ahmad Ahmad |
ICL |
GPU Accelerated Memory-bound Linear Algebra Kernels |
Amhad-GPU-Accelerated-Memory-bound-Linear-Algebra-Kernels-2015-04-17.pdf |
|
2015/04/10 |
Tingxing Dong |
ICL |
Batched One-sided Factorizations on Hardware Accelerators Based on GPUs |
Dong-Batched-One-sided-Factorizations-on-Hardware-Accelerators-Based-on-GPUs.pdf |
|
2015/03/27 |
Yves Robert |
INRIA |
Voltage Overscaling Algorithms for Energy-Efficient Workflow Computations With Timing Errors |
Robert-Voltage-Overscaling-Algorithms-for-Energy-Efficient-Workflow-Computations-With-Timing-Errors-2015-03-27.pdf |
|
2015/03/20 |
Anthony Danalis |
ICL |
Using PaRSEC to Develop Non-static Applications |
|
|
2015/03/13 |
Audris Mockus |
EECS |
Evidence Engineering |
Mockus-Evidence-Engineering-2015-03-13.pdf |
|
2015/03/06 |
Azzam Haidar |
ICL |
Performance Bounds in Symmetric Eigenvector Calculations |
Haidar-PLASMA-MAGMA-PARSEC-Performance-Bounds-in-Symmetric-Eigensolver-2015-03-06.pdf |
|
2015/02/27 |
Piotr Luszczek |
ICL |
Deep Neural Networks for Image Classification – A Primer |
Luszczek-Deep-Neural-Net-Primer-2015-02-25.pdf |
|
2015/02/13 |
Yves Robert |
ICL |
Scheduling Computational Workflows on Failure-prone Platforms |
Robert-Scheduling-Computational-Workflows-on-Failure-prone-Platforms-2015-02-13.pdf |
|
2015/02/06 |
Amina Guermouche |
ICL |
FoREST-mn: Runtime DVFS Beyond Communication Slack |
Guermouche-FoREST-mn-Runtime-DVFS-Beyond-Communication-Slack-2015-02-06.pdf |
|
2015/01/23 |
George Bosilca |
ICL |
Building Blocks for Resilient Applications |
Bosilca-Building-Blocks-for-Resilient-Applications-2015-01-23.pdf |
|
2015/01/16 |
Emmanuel Jeannot |
INRIA |
Topology Aware Data Management |
Jeannot-Topology-Aware-Data-Management-2015-01-16.pdf |
|
2015/01/08 |
Tony Hey |
|
The Fourth Paradigm: Data-Intensive Scientific Discovery, Open Science and the Cloud |
Hey-The-Fourth-Paradigm-Data-Intensive-Scientific-Discovery-Open-Science-and-the-Cloud-2015-01-08.pdf |
|
2014/12/12 |
Ichitaro Yamazaki |
ICL |
Mixed-precision orthogonalization scheme and its case-studies with GPUs |
|
|
2014/12/05 |
Asim YarKhan |
ICL |
Latest Developments in the PAPI Performance Monitoring Library |
YarKhan-PAPI-Performance-Application-Programming-Interface-2014-12-05.pdf |
|
2014/11/14 |
Chongxiao Cao |
ICL |
Design for a Soft Error Resilient
Dynamic Task-based Runtime |
Cao-Design-for-a-Soft-Error-Resilient-Dynamic-Task-based-Runtime-2014-11-14.pdf |
|
2014/11/07 |
Adrien Remy |
LRI |
Using Random Butterfly Transformation to Solve Dense Linear Systems Using Accelerators |
Remy-Using-Random-Butterfly-Transformation-to-Solve-Dense-Linear-Systems-Using-Accelerators-2014-11-07.pdf |
|
2014/10/31 |
Aurelien Bouteiller |
ICL |
UCCS: A Communication Substrate for Open SHMEM (and more) |
Bouteiller-UCCS-A-Communication-Substrate-for-Open-SHMEM-2014-10-31.pdf |
|
2014/10/24 |
Yves Robert |
ICL |
Assessing general-purpose algorithms to cope with fail-stop and silent errors |
Robert-Algorithms-for-coping-with-silent-errors-2014-10-24.pdf |
|
2014/10/17 |
Florent Lopez |
ENSEEIHT |
Sparse direct solvers on top of runtime systems |
Lopez-Sparse-direct-solvers-on-top-of-runtime-systems-2014-10-17.pdf |
|
2014/10/10 |
Alfredo Buttari |
ENSEEIHT |
Improving multifrontal solvers by means of Block Low-Rank approximations |
Buttari-Improving-multifrontal-solvers-by-means-of-Block-Low-Rank-approximations-2014-10-10.pdf |
|
2014/10/03 |
Hartwig Anzt |
ICL |
Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs |
Anzt-Asynchronous-Iterative-Algorithm-for-Computing-Incomplete-Factorizations-on-GPUs-2014-10-03.pdf |
|
2014/09/26 |
Azzam Haidar |
ICL |
Towards Batched Linear Solvers on Accelerated Hardware Platforms |
Haidar-Towards-Batched-Linear-Solvers-on-Accelerated-Hardware-Platforms-2014-09-26.pdf |
|
2014/09/19 |
Simplice Donfack |
ICL |
Improve the applicability of highly efficient stencil compilers to a wider class of problems |
Donfack-Improve-the-applicability-of-highly-efficient-stencil-compilers-2014-09-19.pdf |
|
2014/09/12 |
George Ostrouchov |
ORNL |
Taking R to Big Platforms and Supercomputers with pbdR |
|
|
2014/09/05 |
Theo Mary |
INP-ENSEEIHT |
Performance Study of a Randomized Low-rank Approximation using multi-GPU |
Mary-Randomized-Low-rank-Approximation-using-multi-GPU-2014-09-05.pdf |
|
2014/08/29 |
Gregoire Pichon |
INRIA |
Divide and Conquer: a symmetric tridiagonal eigensolver in PLASMA |
Pichon-Divide-and-Conquer-a-symmetric-tridiagonal-eigensolver-in-PLASMA-2014-08-29.pdf |
|
2014/08/22 |
Tracy Rafferty |
ICL |
Conference travel |
Rafferty-Conference-Travel-2014-08-22.pdf |
|
2014/07/11 |
George Bosilca |
ICL |
Combining Recent HPC Techniques for 3D Geophysics Acceleration |
|
|
2014/06/27 |
Ryan Glasby |
JICS |
Comparison of SU/PG and DG Finite-Element Techniques for the Compressible Navier-Stokes Equations on Anisotropic Unstructured Meshes |
Glasby-Comparison-of-SUPG-and-DG-Finite-Element-Techniques-2014-06-27.pdf |
|
2014/06/20 |
Yves Robert |
ICL |
Algorithms for coping with silent errors |
Robert-Algorithms-for-coping-with-silent-errors-2014-06-20.pdf |
|
2014/06/13 |
Tingxing Dong |
ICL |
A Step towards Energy Efficient Computing: Redesigning A Hydrodynamic Application on CPU-GPU |
Dong-A-Step-towards-Energy-Efficient-Computing-2014-06-14.pdf |
|
2014/06/06 |
Kris Garrett |
ORNL |
A Nonlinear QR Algorithm for Banded Nonlinear Eigenvalue Problems |
Garrett-Nonlinear-QR-Algorithm-for-Banded-Nonlinear-Eigenvalue-Problems-2014-06-06.pdf |
|
2014/05/30 |
Grigori Fursin |
INRIA |
Collective Mind: community-driven systematization
and automation of program optimization |
Fursin-Collective-Mind-program-optimization-2014-05-30.pdf |
|
2014/05/16 |
Azzam Haidar |
ICL |
MAGMA: LU Factorization for Small Matrices |
haidar_may17_2014.pdf |
|
2014/05/14 |
Thomas Herault |
ICL |
DPLASMA/PaRSEC |
|
|
2014/05/09 |
Hartwig Anzt |
ICL |
Hybrid Multi-Elimination ILU Preconditioners on GPUs |
Anzt-Hybrid-Multi-Elimination-ILU-Preconditioners-on-GPUs-2014-05-09.pdf |
|
2014/05/02 |
Ichitaro Yamazaki |
ICL |
Performance of s-step GMRES to avoid communication on/between GPUs |
Yamazaki-Performance-of-s-step-GMRES-to-avoid-communication-on-GPUs-2014-05-02.pdf |