CTWatch
November 2007
Software Enabling Technologies for Petascale Science
E. Wes Bethel, Lawrence Berkeley National Laboratory
Chris Johnson, University of Utah
Cecilia Aragon, Lawrence Berkeley National Laboratory
Prabhat, Lawrence Berkeley National Laboratory
Oliver Rübel, Lawrence Berkeley National Laboratory
Gunther Weber, Lawrence Berkeley National Laboratory
Valerio Pascucci, Lawrence Livermore National Laboratory
Hank Childs, Lawrence Livermore National Laboratory
Peer-Timo Bremer, Lawrence Livermore National Laboratory
Brad Whitlock, Lawrence Livermore National Laboratory
Sean Ahern, Oak Ridge National Laboratory
Jeremey Meredith, Oak Ridge National Laboratory
George Ostrouchov, Oak Ridge National Laboratory
Ken Joy, University of California, Davis
Bernd Hamann, University of California, Davis
Christoph Garth, University of California, Davis
Martin Cole, University of Utah
Charles Hansen, University of Utah
Steven Parker, University of Utah
Allen Sanderson, University of Utah
Claudio Silva, University of Utah
Xavier Tricoche, University of Utah

3
Query-Driven Visualization

The term “query-driven visualization” (QDV) refers to the process of limiting visual data analysis processing only to “data of interest.”8 In brief, QDV is about using software machinery combined with flexible and highly useful interfaces to help reduce the amount of information that needs to be analyzed. The basis for the reduction varies from domain to domain, but boils down to “what subset of the large dataset is really of interest for the problem being studied.” This notion is closely related to that of “feature detection and analysis,” where “features” can be thought of as subsets of the larger population that exhibit some characteristics that are either intrinsic to individuals within the population (e.g., data points where there is high pressure and high velocity) or that are defined as relations between individuals within the population (e.g., the temperature gradient changes sign at a given data point).

For the purposes of our discussion here, we will focus on the first category of features. The second category is also of great interest to our team, where we have developed new technologies for topological data analysis9 that have proven very useful as the basis for enabling scientific knowledge discovery.

Broadly speaking, QDV consists of three broad conceptual elements. One is how one goes about “specifying interesting.” Another is how one displays and analyzes that subset of data. Yet another is the process of storing, indexing, querying and retrieving data subsets from large data archives.

Specifying Queries

In many scientific data analysis applications, “interesting” data can be defined by compound boolean range queries of the form “(temperature > 1000) AND (0.8 < = density <= 1.0)”. Obviously, one could manually enter such an SQL-like query, but doing so is somewhat clumsy from an interface perspective, but also requires that the user know something about the data characteristics. In many instances, the users are quite familiar with their data, so the expectation of a priori knowledge is not unreasonable. Rather than typing in queries, we propose that a visual interface for specifying queries will result in greater scientific productivity and better serve our mission of enabling data exploration and knowledge discovery.

We have implemented several different types of visual interfaces for specifying queries. The general theme in these implementations is that the visual interface helps the user to formulate queries while at the same time gaining an overall sense of data characteristics. This type of interaction is a variation on a well-known usability design principle called “context and focus,” where a given presentation affords the opportunity to see overviews of data (the context) as well as details about specific data of interest (the focus). Numerous works have applied this principle to the effective navigation of complex dataspaces, e.g., application to browsing of hierarchical filesystems.10

One example for formulating queries along these lines is an application for exploration of large collections of particle-based datasets produced by the Gyrokinetic Turbulence Code (GTC), which is used to model microturbulence in magnetically confined fusion plasmas.11 Output from GTC consists of on the order of tens of millions of particles per timestep on present-day computational platforms; this figure is expected to rise at a rate commensurate with growth in computational capacity. From this output, fusion researchers are interested in studying various types of phenomena: formation, evolution and analysis of turbulent structures (eddies, vortices, etc.); and how particle “trapping” and “untrapping” in magnetic fields through microturbulence leads to an erosion of energy efficiency.

Pages: 1 2 3 4 5 6 7 8

Reference this article
Bethel, E. W., Johnson, C., Aragon, C., Prabhat, Rübel, O., Weber, G., Pascucci, V., Childs, H., Bremer, P.-T., Whitlock, B., Ahern, S., Meredith, J., Ostrouchov, G., Joy, K., Hamann, B., Garth, C., Cole, M., Hansen, C., Parker, S., Sanderson, A., Silva, C., Tricoche, X. "DOE's SciDAC Visualization and Analytics Center for Enabling Technologies - Strategy for Petascale Visual Data Analysis Success," CTWatch Quarterly, Volume 3, Number 4, November 2007. http://www.ctwatch.org/quarterly/articles/2007/11/does-scidac-visualization-and-analytics-center-for-enabling-technologies-strategy-for-petascale-visual-data-analysis-success/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.