Discovery is the most open-ended and the most time consuming of the three research stages. The goal is to uncover and understand the socio-cultural system that frames human action; that understanding must be consistent with the way local people understand it and it must be expressed in terms of the local (emic) categories people use to describe and categorize their own reality. Researchers collect and analyze verbal, observational and contextual information to characterize what people say and do in their natural environment. Consistencies and, more frequently, inconsistencies help identify unarticulated or unrecognized needs, gaps and adaptations often called “work-arounds” and “disconnects.” Translating disconnects into the frame of reference of socio-cultural systems allows the researcher to identify and neutralize well-established assumptions. Such assumptions would otherwise be taken as given, which leads to stereotypic treatment and precludes understanding of the important and difficult issues.
The key to a successful discovery in a context such as HPC software development is the combination of two approaches: case studies and rapid ethnographic methods.
The case study approach is a well understood method for gathering initial information about a situation, as defined by Robert K. Yin: “an in-depth look at one or more specific incidents or examples.”5 The case study typically employs a set of qualitative, open-ended methodologies to explore a topic or problem domain and develop hypotheses. Methods may include data collection techniques such as Document Reviews, Observation, Collection of Contextual Artifacts, Self-Reporting, and Interviews. The breadth of data that can be collected provides foundational knowledge for developing hypotheses. The case study approach has been used in software engineering6 7 8 9 and has played a significant role in the HPCS productivity research program.10 11 12 13 14
The objective of rapid ethnographic assessment in discovery research is typically to construct a socio-cultural model of the local living system.15 16 All rapid ethnographic approaches share three important characteristics “(1) a system perspective, (2) triangulated data collection, and (3) iterative data collection and analysis.”15 4 The value of the case study and rapid ethnographic assessment approaches is growing; they have been used by the National Center for Atmospheric Research,17 Department of Energy,18 19 20 NASA,21 and for describing technical change at the Department of Defense.22
The application of these approaches stresses open-ended interviews, site tours (contextual observation), participant observation, literature reviews, cultural history, and semiotic (content) analysis. An example from the HPCS research will demonstrate how discovery research generates cultural insights; those cultural insights lead to better understanding and help develop hypotheses that can be tested in the subsequent research stage.
Anecdotal evidence from DARPA and the HPCS Mission Partners suggested that an “expertise gap” lay at the heart of the crisis in HPC application development.23 Case studies were conducted first to explore the expertise issue, starting with a detailed look at how professional HPC programmers and teams spend their time. Qualitative data collection methods included semi-structured interviews with individual HPC programmers, and contextual observations at sites in which HPC programmers work. Of course the raw data from these methods did not directly lead to the kind of insights that are the goal of this research stage. Combining and comparing the data, the team began to identify patterns across individuals and teams, plot bottlenecks and create models of HPC programmers, all based on information taken directly from the HPC professionals and the context of their work.
From five case studies a pattern emerged in the area of expertise. In all cases at least one founding team member had been recruited for special knowledge of science, but in each case the scientist was not an HPC programmer and had little or no knowledge of FORTRAN or C++. The scientist’s first required task was either to learn one of the programming languages or to build a working relationship with someone who did know it. In either case, the educational process took considerable time before the individual/pair could perform effectively. Project management was typically taken on by another person whose role was to “run interference” by keeping the sponsor happy and negotiating for time on a shared large machine. Teams in this context typically take about four to six years to get a working code. Success is commonly attributed to having the right mix of expertise. The team was successful only when they had the appropriate mix of knowledge, represented by four areas:
- Science
- Programming
- Scaling/Optimizing
- Management
Even having the range of knowledge is insufficient. Effective communication and collaboration among the experts can be very challenging and is crucial to project success.
Underlying the ethnographic case study approach is the understanding that all people belong to one or more networks of interlocking social relationships in which members share a common or core set of beliefs, values and behaviors. Anthropologists and other trained ethnographers use various methods to uncover the core sets of beliefs, such as:
- Gathering individual (emic) perspectives from members of these socio-cultural groups;
- Examining the collected information to identify patterns of shared beliefs, behaviors, values and rules;
- Constructing group "mental models" from identified patterns to understand the meaning at the core of the system; and
- Interpreting how the members of a socio-cultural network use their mental models to construct and express appropriate shared behaviors, beliefs, and values, to provide a contextual frame of meanings for products, and services.
The case studies of HPC software development led to recognizable patterns that might explain why HPC expertise is so scarce. A hypothesis was developed postulating that domain specific expertise in at least four different areas is needed to use highly parallel machines. As machines get bigger and more complex, the pool of experts narrows. Very, very few people have complete skill sets. Team approaches are the best strategy at the moment, but this by itself does not appear to represent a long-term solution. The next research step was to craft a more focused study to test the hypothesis and to understand in more detail when and how the various areas of expertise were used; this takes place in the second stage of the research framework.