The amount of data that needs to be analyzed to produce models of programmer workflows is quite large. We are developing automated tools for visualization, modelling, and simulation of TMMs to facilitate the kind of analysis described in earlier sections.
There are two main types of data that are being collected in the experiments. Physical activities such as code edits, compiles, and executions are automatically captured by the instrumented development environment. During development, in some experiments, the students are also asked to record the time they spend performing logical activities such as thinking, serial coding, parallel coding, and testing. It is these logical activities that we use to create TMMs of the workflows. Alternatively, physical activities can be mapped to logical activities using a set of heuristics.
Whether the logical activities come from student logs11 or heuristic mapping,9 12 the end result is a list of activities and associated effort (measured in hours), as shown in Figure 6. We have created a Python program that parses this list of activities for each student and counts the transitions and dwell times for each activity. In the example shown, the student starts in the planning stage and then transitions to serial coding. This is represented in the transition matrix as T12 = 1. Consecutive entries for the same activity are combined. Thus in the dwell time matrix, the amount of time spent in the planning state before transitioning to the serial coding state is represented as D12 = 1 + 3 = 4. These transitions and dwell times can be aggregated across students and similar assignments to create a larger sample for analysis.
We calculate the probability for each state transition from the transition matrix as:
Similarly, the average dwell time for each transition is calculated as:
Once the transition probabilities and dwell times have been computed, the next step is to generate a graph description that can be used to visualize the TMM. Our initial choice for visualization was the Graphviz tool, which uses the DOT language for graph description. Figure 7 shows the student workflow from Figure 6 visualized as a TMM using Graphviz. Using Graphviz we have created a graphical browser for rapid visualization of multiple data sets (see Figure 8).