CTWatch Quarterly » What’s Working in HPC: Investigating HPC User Behavior and Productivity

What’s Working in HPC: Investigating HPC User Behavior and Productivity

Nicole Wolter, San Diego Supercomputing Center
Michael O. McCracken, San Diego Supercomputing Center
Allen Snavely, San Diego Supercomputing Center
Lorin Hochstein, University of Nebraska, Lincoln
Taiga Nakamura, University of Maryland, College Park
Victor Basili, University of Maryland, College Park

Conjecture 5: HPC programmers would demand dramatic performance improvements to consider major structural changes to their code.

Programmers we surveyed showed a surprising indifference to the risks involved in rewriting often very large code bases in a new language or changing a communication model. Although many successful projects last for tens of years without making significant changes, 8 of the 12 Summer Institute attendees responded that they were willing to make major code changes for surprisingly small system performance improvements or policy changes.

Many of the codes discussed were larger than 100,000 lines of code (LOC). Though all eight codes over 10,000 LOC had checkpoint restart capabilities, three users were willing to rewrite code for the ability to run a single job longer, two requesting only a factor of two job runtime limit extension. The respondents were likely unaware that most sites will allow such small exceptions on a temporary basis. Of the three respondents with more than 100,000 lines of code, only one considered such a major change, requesting in return a factor of ten improvement in either processor speed or job time limits. The results imply that although performance, judged by time to solution, is not always the main goal of HPC users, users would be very receptive to work for guaranteed system and policy changes.

While we see some conflicting responses, it might be possible for HPC centers to capitalize on these attitudes to get users to use profiling tools and spend some effort to improve individual code performance, and ultimately queue wait times, by giving minor compensation to cooperative users. Also, in some cases, it would seem that there is a gap between what users want and what they think they can get. This gap could be bridged with improved communication with users regarding site policies and resource options.

Conjecture 6: A computer science background is crucial to success in performance optimization.

It may seem straightforward to say that if you want to improve your code performance you should take it to a computer scientist or software engineer. However, of the developers on successful projects interviewed, only one had a formal computer science background. In fact, many of these successful projects are operating at very large scale without any personnel with a formal computer science or software engineering background.

There was a general consensus among interviewees that without competence in the domain science, changes to the project are destined for failure. Two subjects actually contracted out serious code bugs and made use of library abstractions to avoid in depth knowledge of parallel programming. One such library was written by a computer scientist, and once it matured, users were able to achieve high performance using it without a parallel computing background.

HPC centers and software developers should keep in mind that their target audience is not computer scientists, but rather physical scientists with a primary focus on scientific results.

Pages: 1 2 3 4 5 6 7 8 9

CTWatch is a collaborative effort				Sponsored By