|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
General questions
|
What is HPL?
HPL is a portable implementation of the Linpack benchmark.
|
What is the license for the HPL code?
|
Where do I send my suggestions or questions about HPCC?
|
What does HPL measure?
|
What software in addition to the benchmark program is needed?
|
What is Gflop/s?
|
What is the theoretical peak performance?
|
Why is my performance results below the theoritical peak?
The performance of a computer is a complicated issue, a
function of many interrelated quantities. These quantities include the
application, the algorithm, the size of the problem, the high-level
language, the implementation, the human level of effort used to
optimize the program, the compiler's ability to optimize, the age of
the compiler, the operating system, the architecture of the computer,
and the hardware characteristics. The results presented for this
benchmark suite should not be extolled as measures of total system
performance (unless enough analysis has been performed to indicate a
reliable correlation of the benchmarks to the workload of interest)
but, rather, as reference points for further evaluations.
|
Why are the performance results for my computer different than some other machine with the same characteristics?
|
What about the one processor case?
|
Can HPL be outperformed?
|
How do I tune HPL?
|
What is the relation of this benchmark to the Linpack benchmark?
|
|
Input file
|
What problem size (matrix dimension N) should I use?
In order to find out the best performance of your system,
the largest problem size fitting in memory is what you should aim for.
The amount of memory used by HPL is essentially the size of the
coefficient matrix. So for example, if you have 4 nodes with 256 Mb of
memory on each, this corresponds to 1 Gb total, i.e., 125 M double
precision (8 bytes) elements. The square root of that number is 11585.
One definitely needs to leave some memory for the OS as well as for
other things, so a problem size of 10000 is likely to fit. As a rule of
thumb, 80 % of the total amount of memory is a good guess. If the
problem size you pick is too large, swapping will occur, and the
performance will drop. If multiple processes are spawn on each node
(say you have 2 processors per node), what counts is the available
amount of memory to each process.
|
What block size NB should I use?
|
What process grid (P x Q) should I use?
This depends on the physical interconnection network you
have. Assuming a mesh or a switch HPL "likes" a 1:k ratio with k in
[1..3]. In other words, P and Q should be approximately equal, with Q
slightly larger than P. Examples: 2 x 2, 2 x 4, 2 x 5, 3 x 4, 4 x 4, 4
x 6, 5 x 6, 4 x 8 ... If you are running on a simple Ethernet network,
there is only one wire through which all the messages are exchanged. On
such a network, the performance and scalability of HPL is strongly
limited and very flat process grids are likely to be the best choices:
1 x 4, 1 x 8, 2 x 4 ...
|