As described in the Bethel, Johnson et al. article, the Visualization and Analytics Center for Enabling Technologies (VACET) group is developing solutions to this problem that combine “query-driven” strategies, which pre-filter the data to be visualized for relevance and interest, with “context and focus” user interface designs, which enable scientists to control their field of attention while navigating complex data spaces. The success of this approach obviously depends on finding fast and efficient ways to index and search targeted data sets; VACET is collaborating closely with other centers in researching this problem. The work of the Institute for Ultrascale Visualization (Ultraviz Institute), described in Kwan-Liu Ma’s article, also (by necessity) puts the question of interface design at the center of its research agenda, especially for cases requiring the exploration of time-varying multivariate volume data. The Institute’s investigation of “in situ” visualization attempts to address problems at the other end of the visualization pipeline. To overcome the severe problems of data logistics involved in managing the rendering of multi-terabyte data sets in networked environments, in situ visualization performs the necessary calculations while data still resides on the supercomputer that was used to generate it.
Similar problems of petascale data logistics are central to the mission of the three CETs that focus on large scale data management for distributed environments. As the leaders of the Center for Enabling Distributed Petascale Science (CEDPS) make clear in their article (Schopf et al.), such questions of “data placement” are central to the end-to-end effectiveness of SciDAC’s highly distributed collaboration environments. The authors describe their development of a policy-driven data placement service, which builds on their experience working with several leading application communities, including HEP, Fusion Energy, Combustion, and Earth Systems. Complementary efforts on automated scientific workflow, using well known Kepler middleware, are also underway at the Scientific Data Management (SDM) Center. But in order to help investigators manage and analyze the data deluge they confront, the SDM Center research portfolio extends farther down the storage middleware stack. In the Shoshani et al. article, they describe the ensemble of software tools and middleware that they are developing to help scientists to explore their data through automatic feature extraction and highly scalable indexing of massive data sets, and to optimize their use of storage resources through low level parallel I/0 libraries and in situ processing on the storage nodes (“active storage”).
The third CET focused on data management – the Earth System Grid Center for Enabling Technologies (ESG-CET) – revolves, as the name suggests, around a single major application community, viz. the climate research community. This community has been at the forefront of the data grid movement for many years, aggressively developing and deploying data grid technology to show how high impact data sharing can be implemented on a global scale, even while the volume of data continues to escalate. Their discussion (Williams et al.) of the past successes, the current implementation, and the future plans for the Earth System Grid describes a model that several other application communities would do well to emulate as we enter the era of petascale data.
Reflecting on the range and diversity of the work on software cyberinfrastructure presented in this issue of CTWatch Quarterly, it’s hard to avoid the conclusion that the relentless movement toward petascale science, in which the DOE SciDAC program has played such a leading role, has generated a software ecosystem whose continued vitality seems more and more essential to success on the new frontiers of research. But we cannot be complacent. The push beyond petascale is just around the corner and, as before, the effort to scale up even further is certain to bring up uniquely difficult problems that we have not yet anticipated. We must hope, therefore, that the next generation of enabling software technology researchers contains the same kind of energetic, dedicated and creative pioneers that have led the current one.






