Hardware accelerators, such as Graphics Processing Units (GPUs) from Nvidia and AMD, and Xeon Phi coprocessors from Intel, offer an order of magnitude more computing power and an order of magnitude more memory bandwidth than standard processors, and create an unprecedented opportunity for breakthroughs in science and engineering.
However, an accelerator is a different kind of beast when it comes to performance tuning, and its architectural features usually pose unique programming challenges. For instance, an accelerator’s performance cannot be unleashed without applying the Single Instruction Multiple Threads (SIMT) paradigm or the Single Instruction Multiple Data (SIMD) paradigm. Accelerators have massive numbers of simple cores, with static pipelines, and no branch prediction, and a multitude of constraints that can obliterate performance in numerous situations.
The objective of BEAST is to embrace the nature of accelerators instead of fighting it. Use BEAST to write high performance kernels in a tunable manner, and let BEAST unleash its power of heuristic autotuning: sweep through a large search space, collect massive amounts of performance data, and plow through that data with machine learning techniques. Use BEAST's heavy machinery to optimize your code to the metal, without descending into the dark abyss of assembly.