Submitted by webmaster on
| Title | Comparing and Contrasting User and Runtime Directed Data Placement Strategies for Owner-Compute, Multi-accelerator Distributed Task Based Scheduling |
| Publication Type | Conference Proceedings |
| Year of Publication | 2025 |
| Authors | Bouteiller, A., Q. Cao, J. Schuchart, and T. Herault |
| Editor | Diehl, P., Q. Cao, T. Herault, and G. Bosilca |
| Conference Name | Workshop on Asynchronous Many-Task Systems and Applications |
| Edition | 1 |
| Pagination | 140 - 153 |
| Date Published | 2025-10 |
| Publisher | Springer Cham |
| Conference Location | St. Louis, MO |
| ISBN Number | 978-3-031-97195-2 |
| Keywords | accelerator, Cholesky Factorization, Matrix computations, task-based runtime |
| Abstract | Given GPU accelerators’ high arithmetic capacity, reducing data motion and optimizing locality are critical to achieving high performance. The task-based programming paradigm, as employed in the PaRSEC micro-task runtime system, enables the decoupling of data distribution and computation mapping to resources from the algorithm’s base expression. In this paper, we leverage this capability to explore the performance impact of several data placement strategies–some automatic and runtime-directed, and some user-directed–for the owner-compute scheduling model in the context of split-memory accelerators. We implement three different strategies for data and task mapping: a randomized first-touch policy that assigns data randomly to an accelerator, a load-balancing strategy that assigns data to the accelerator with the lowest load, and we compare it to a user-directed strategy that minimizes cross-accelerator traffic by placing tasks according to a cross-memory bandwidth minimizing strategy. We carry the evaluation on a variety of multi-GPU accelerated systems , including the Frontier system, and demonstrate that runtime-directed automatic data placement can improve locality compared to naive strategies, but also highlight that the capability of easily having modifiable user-directed data placement is of crucial importance to achieve peak performance. |
| URL | https://link.springer.com/chapter/10.1007/978-3-031-97196-9_12 |
| DOI | 10.1007/978-3-031-97196-9_12 |



