Like rapid ethnographic research, evaluative research has its own history that can be traced back to the middle of the 20th century. In 1967, Michael Scriven proposed that all evaluation could be broken down into two distinct types, formative and summative evaluation.29 Formative evaluation validates and improves upon an idea or hypothesis. Summative evaluation answers the question, “to what extent?” Evaluative researchers use a toolbox of methods from many of the social sciences to validate a hypothesis or determine its extent in a population. Methods are typically quantitative. Again the HPCS research provides an example, although this phase of the program is in very early stages and the results are still preliminary.
One of the patterns that emerged from case study research suggested that HPCS code teams were more concerned about programming correctness than performance. The workflow pinpointed the most likely places where a programmer would find the most difficulty. Up to this point the conventional wisdom had been accepted without question, namely that performance was the paramount concern of the programmer. To evaluate the extent of this pattern, which was uncovered in Stage 1 and 2, a survey was administered to HPC programmers at National Labs and in private institutions where large, highly parallel code is written. Quantitative statistical procedures were used to analyze the survey data. Table 3 provides the resulting response distribution when asked about the top issues facing the HPC programmer.
The case studies in the first stage of this research had provided information about what programmers said and the empirical studies of programmers in the second stage supported the findings from the earlier research and validated the workflow of the programmer with the quantitative data. Not surprisingly, the survey data collected in the third stage of the research confirmed that performance is important in HPC code, but also confirmed that programming correctly is at least as important, as reported by 90% of those surveyed. The perceived value of performance is a strong shared value in the HPC community; however, the concern about correctness appears to be at least as strong although not verbalized as often.
This example highlights the need to follow the research stages from hypothesis development (Stage 1) and question generation (Stage 2) before attempts to quantify. Unfortunately there is a tendency to jump to quantitative research because of the supposed reliability of numbers. However, quantitative results are only as good as models of the phenomena that are the context for interpretation of the data. In the example, without having proceeded through the logical progression from hypothesis to validation, the significant concern for program correctness might have been overlooked. And, of course, the link between correctness and need for expertise is clear. Such an oversight might have led to hardware and software design decisions that turn out to be counterproductive in the area of program correctness.