CTWatch
November 2007
Software Enabling Technologies for Petascale Science
Kwan-Liu Ma, University of California, Davis

3
In-Situ Visualization

Due to the size of data output by a large-scale simulation, visualization is almost exclusively done as a post-processing step. Even though it is desirable to monitor and validate some of the simulation stages, the cost of moving the simulation output to a visualization machine could be too high to make interactive visualization feasible. A better approach is not to move the data, or to keep the data that must be moved to a minimum. That is, both simulation and visualization calculations run on the same parallel supercomputer so the data can be shared, as shown in Figure 5. Such in-situ processing can render images directly or extract features, which are much smaller than the full raw data, to store for on-the-fly or later examination. As a result, reducing both the data transfer and storage costs early in the data analysis pipeline can optimize the overall scientific discovery process.

In practice, however, this approach has been sparsely adopted because for two reasons. First, most scientists have been reluctant to use their supercomputer time for visualization calculations. Second, it could take a significant effort to couple a legacy parallel simulation code with an in-situ visualization code. In particular, the domain decomposition optimized for the simulation is often unsuitable for parallel visualization, resulting in the need to replicate data for speeding up the visualization calculations. Hence, the common practice for scientists has been to store only a small fraction of the data or to study the stored data at a coarser resolution, which defeats the original purpose of performing the high-resolution simulations. To enable scientists to study the full extent of the data generated by their simulations and for us to possibly realize the concept of steering simulations at extreme-scale, we should begin investigating the option of in-situ processing and visualization. Many scientists become convinced that simulation-time feature extraction, in particular, is a feasible solution to their large data problem. An important fact is that during the simulation time, all relevant data about the simulated field are readily available for the extraction calculations.

Figure 5


Figure 5. Left: the conventional ways to visualize a large-scale simulation running on a supercomputer. Right: In-situ processing and visualization of large-scale simulations.

In many cases, it is also desirable and feasible to render the data in-situ for monitoring and steering a simulation. Even in the case that runtime monitoring is not practical due to the length of the simulation run or the nature of the calculations, it could still be desirable to generate an animation characterizing selected parts of the simulation. This in-situ visualization capability is especially helpful when a significant amount of the data is to be discarded. Along with restart files, the animations could capture the integrity of the simulation with respect to a particularly important aspect of the modeled phenomenon.

We have been studying in-situ processing and visualization for selected applications to understand the impact of this new approach on ultra-scale simulations, subsequent visualization tasks, and how scientists do their work. Compared with a traditional visualization task that is performed in a post-processing fashion, in-situ visualization brings some unique challenges. First of all, the visualization code must interact directly with the simulation code, which requires both the scientist and the visualization specialist to commit to this integration effort. To optimize memory usage, we have to find a way for the simulation and visualization codes to share the same data structures to avoid replicating data. Second, visualization workload balancing is more difficult to achieve since the visualization has to comply with the simulation architecture and be tightly coupled with it. Unlike parallelizing visualization algorithms for standalone processing where we can partition and distribute data best suited for the visualization calculations, for in-situ visualization, the simulation code dictates data partitioning and distribution. Moving data frequently among processors is not an option for visualization processing. We need to rethink this to possibly balance the visualization workload so the visualization is at least as scalable as the simulation. Finally, visualization calculations must be low cost, with decoupled I/O for delivering the rendering results while the simulation is running. Since the visualization calculations on the supercomputer cannot be hardware accelerated, we must find other ways to simplify the calculations such that adding visualization would take away only a very small fraction of the supercomputer time allocated to the scientist.

Pages: 1 2 3 4

Reference this article
Ma, K.-L. "Emerging Visualization Technologies for Ultra-Scale Simulations," CTWatch Quarterly, Volume 3, Number 4, November 2007. http://www.ctwatch.org/quarterly/articles/2007/11/emerging-visualization-technologies-for-ultra-scale-simulations/

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation (NSF).

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in articles published by the Quarterly or appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.