Two methods were chosen to collect data: one quantitative, one qualitative. Quantitative information on programmer activity (time on task details) was collected using HackyStat, an in-process software engineering measurement and analysis tool.28 HackyStat recorded hours of event traces from development tools (for example “open file,” and “build”) while HPC professionals developed code. Additional (qualitative) data was collected in the form of real-time, time-stamped journals written by professional code developers who agreed to record a personal narrative of their work.
By combining HackyStat telemetry data that measured activity with programmer journals, the team was able to corroborate, validate, and interpret results. For example, the significance of the expertise gaps and bottlenecks to the HPC productivity problem became apparent when studying individual professionals and their time usage; the journal entries represented real-time accounts of code development from the programmer’s perspective. These patterns were used to define a typical HPC development workflow, identify where in the workflow the most effort is being expended, characterize the expertise profiles associated with workflow tasks, and draw conclusions about productivity bottlenecks and their root causes.
These results are summarized in Figure 2, which illustrates the typical workflow that developers go through in creating and optimizing HPC applications along with the skill sets required to perform each activity. A typical workflow includes understanding the problem that needs to be solved, formulating an initial computational solution, empirically evaluating the proposed solution through prototyping or experimentation, coding for sequential execution, evaluating the overall computational approach, then coding and optimizing the results for a parallel platform.
Activities that consume the greatest proportion of resources, effort, time, and expertise, within the overall programming effort were also identified:
- Developing correct scientific programs: activities associated with translating an understanding of the scientific problem that must be solved (e.g., a predictive weather model) into code.
- Code optimization and tuning: activities associated with refining a serial version of the code to ensure correctness and achieve desired levels of accuracy and efficiency.
- Code parallelization and optimization: activities associated with parallelizing the code and tuning to achieve high machine utilization and rapid execution.
- Porting: where a solution exists, this comprises the activities associated with translating the existing solution to a representation appropriate for a new computing platform.
Once the expertise problem was understood and the likely location that consumed the most time and effort for those experts was pinpointed, the question of how widely these findings could be applied was raised. Full validation required mapping the extent of the expertise gap, calling for a large quantitative survey of the sort suitable for the next stage of the research framework.