Comparing and Contrasting User and Runtime Directed Data Placement Strategies for Owner-Compute, Multi-accelerator Distributed Task Based Scheduling

TitleComparing and Contrasting User and Runtime Directed Data Placement Strategies for Owner-Compute, Multi-accelerator Distributed Task Based Scheduling
Publication TypeConference Proceedings
Year of Publication2025
AuthorsBouteiller, A., Q. Cao, J. Schuchart, and T. Herault
EditorDiehl, P., Q. Cao, T. Herault, and G. Bosilca
Conference NameWorkshop on Asynchronous Many-Task Systems and Applications
Edition1
Pagination140 - 153
Date Published2025-10
PublisherSpringer Cham
Conference LocationSt. Louis, MO
ISBN Number978-3-031-97195-2
Keywordsaccelerator, Cholesky Factorization, Matrix computations, task-based runtime
Abstract

Given GPU accelerators’ high arithmetic capacity, reducing data motion and optimizing locality are critical to achieving high performance. The task-based programming paradigm, as employed in the PaRSEC micro-task runtime system, enables the decoupling of data distribution and computation mapping to resources from the algorithm’s base expression. In this paper, we leverage this capability to explore the performance impact of several data placement strategies–some automatic and runtime-directed, and some user-directed–for the owner-compute scheduling model in the context of split-memory accelerators. We implement three different strategies for data and task mapping: a randomized first-touch policy that assigns data randomly to an accelerator, a load-balancing strategy that assigns data to the accelerator with the lowest load, and we compare it to a user-directed strategy that minimizes cross-accelerator traffic by placing tasks according to a cross-memory bandwidth minimizing strategy. We carry the evaluation on a variety of multi-GPU accelerated systems , including the Frontier system, and demonstrate that runtime-directed automatic data placement can improve locality compared to naive strategies, but also highlight that the capability of easily having modifiable user-directed data placement is of crucial importance to achieve peak performance.

URLhttps://link.springer.com/chapter/10.1007/978-3-031-97196-9_12
DOI10.1007/978-3-031-97196-9_12
Project Tags: 
External Publication Flag: