%0 Journal Article %J IEEE Transactions on Parallel and Distributed Systems %D 2017 %T Argobots: A Lightweight Low-Level Threading and Tasking Framework %A Sangmin Seo %A Abdelhalim Amer %A Pavan Balaji %A Cyril Bordage %A George Bosilca %A Alex Brooks %A Philip Carns %A Adrian Castello %A Damien Genet %A Thomas Herault %A Shintaro Iwasaki %A Prateek Jindal %A Sanjay Kale %A Sriram Krishnamoorthy %A Jonathan Lifflander %A Huiwei Lu %A Esteban Meneses %A Mar Snir %A Yanhua Sun %A Kenjiro Taura %A Pete Beckman %K Argobots %K context switch %K I/O %K interoperability %K lightweight %K MPI %K OpenMP %K stackable scheduler %K tasklet %K user-level thread %X In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing a rich set of controls to allow specialization by the user or high-level programming model. We describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version. %B IEEE Transactions on Parallel and Distributed Systems %8 2017-10 %G eng %U http://ieeexplore.ieee.org/document/8082139/ %R 10.1109/TPDS.2017.2766062