News and Announcements
Supercomputing ’12
This year’s Supercomputing Conference will be held on November 10th – 16th at the Salt Palace Convention Center in Salt Lake City, UT. As usual, we expect to have a considerable presence at the conference with BoFs, posters, and talks. Additionally, the University of Tennessee will have its own booth this year where ICL’s research will be featured alongside other UT research centers. Below is a schedule of ICL related activities. For an entire list of activities, visit the SC12 schedule page.
| Sunday, 11th | 3rd Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Workshop, Room 255-B, 9:00am – 5:30pm |
| An overview of fault-tolerant techniques for HPC, Tutorial, Room 355-F, 1:30pm – 5:00pm | |
| Monday, 12th | Linear Algebra Libraries for High-Performance Computing: Scientific Computing with Multicore and Accelerators, Tutorial, Room 355-D, 8:30am – 5:00pm |
| Tuesday, 13th | The 2012 HPC Challenge Awards, BoF, Room 355-A, 12:15pm – 1:15pm |
| MAGMA: A New Generation of Linear Algebra Libraries for GPU and Multicore Architectures, NVIDIA Booth (2217), 2:00pm – 2:30pm | |
| Matrices Over Runtime Systems at Exascale, Poster, East Entrance, 5:15pm – 7:00pm | |
| Acceleration of the BLAST Hydro Code on GPU, Tutorial, East Entrance, 5:15pm – 7:00pm | |
| A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks, Poster, East Entrance, 5:15pm – 7:00pm | |
| TOP500 Supercomputers, BoF, Ballroom-EFGH, 5:30pm – 7:00pm | |
| Wednesday, 14th | Open MPI State of the Union, BoF, Room 155-B, 12:15pm – 1:15pm |
| Power and Energy Measurement and Modeling on the Path to Exascale, BoF, Room 255-EF, 5:30pm – 7:00pm | |
| ICL Alumni Dinner, Faustina, 454 East Broadway, Salt Lake City, UT., 7:00pm | |
| Friday, 16th | Extreme-Scale Performance Tools, Workshop, Room 155-F, 8:30am – 12:30pm |
ORNL’s Titan is Online
ORNL’s Jaguar has finally given way to Titan, a new Cray XK7 supercomputer that uses both AMD Opteron CPUs and NVIDIA K20 GPUs in hopes of achieving 20+ petaflop/s of performance. The machine itself is made up of 18,688 nodes, each of which contains a single 16-core AMD Opteron CPU and NVIDIA K20 GPU. According to ORNL, 90% of the machine’s theoretical performance comes from the GPU. If Titan’s performance goal is achieved, it will likely be the fastest supercomputer on the planet and the next #1 on the TOP500.
NPR’s All Things Considered recently caught up with Buddy Bland, director of ORNL’s Leadership Computing Facility, to discuss the new hybrid supercomputer. Click on the play button below to listen to the interview, or follow this link to NPR’s website.
[audio:http://icl.cs.utk.edu/newsletter/files/2012-11/Buddy_Bland_Titan_NPR_All_Things_Considered.mp3]Conference Reports
ICMS 2012
On October 9th – 11th, Jack Dongarra attended the International Computational Mechanics Symposium in Kobe, Japan, where he gave an invited talk. ICMS 2012 covered many topics in computational mechanics, but this year’s symposium focused on the ever-increasing speed and scale of high performance computers. Jack was one of five plenary speakers, invited from all over the world, to address the state-of-the-art developments in high performance computing and computational mechanics.
Jack’s talk, Algorithmic and Software Challenges when Moving Towards Exascale, focused on how high performance computing has changed over the last 10-years, and what the future may look like based on trends. These changes have had and will continue to have a major impact on software development for high performance computing, including the need for software redesign to fit multicore and hybrid architectures, automatically tuned application software, fault tolerance, communication avoiding algorithms, and exploiting mixed precision for performance.
CASC Fall 2012 Meeting
ICL’s Terry Moore recently attended the Fall 2012 meeting of the Coalition for Academic Scientific Computation (CASC) in Alexandria, VA., where he gave a talk about UTK’s Interdisciplinary Graduate Minor in Computational Science (IGMCS). The CASC is an organization that represents many of the nation’s most forward thinking universities and computing centers as an advocate for advanced computing technology aimed at accelerating scientific discovery and enhancing national competitiveness, global security, and economic success.
Since workforce development is becoming an increasingly critical issue for this community, this meeting featured talks and a panel discussion (in which Terry also participated) about how different universities are moving to build up their Computational Science education programs. Many friends of ICL (e.g., several people from NICS) were at the meeting, as one would expect in light of the makeup of the CASC community.
Recent Releases
clMAGMA 1.0 Released
Matrix Algebra on GPU and Multicore Architectures (MAGMA) is a collection of next generation linear algebra (LA) libraries for hybrid architectures. The MAGMA package supports interfaces for current LA libraries and standards, e.g., LAPACK and BLAS, to allow computational scientists to easily port any LA-reliant software components to heterogeneous systems. clMAGMA is an OpenCL port of the MAGMA library.
clMAGMA 1.0 is now available and includes the following new functionalities:
- Eigen and singular value problem solvers in both real and complex arithmetic, single and double (routines magma_z|c}heevd, magma_{d|s}syevd, magma_{z|c|d|s}geev, and magma_{z|c|d|s}gesvd);
- Matrix inversion routines (routines magma_{z|c|d|s}trtri_gpu, magma_{z|c|d|s}getri_gpu, magma_{z|c|d|s}potri_gpu);
- Orthogonal transformations routines ({z|c}unmqr_gpu, {d|s}ormqr_gpu, {z|c}ungqr, {d|s}orgqr, {z|c}unmtr, {d|s}ormtr, {z|c}unmqr, {d|s}ormqr, {z|c}unmql, {d|s}ormql, {z|c}unghr, and {d|s}orghr).
Visit the MAGMA software page to download the tarball.
HPCC 1.4.2 Released
The HPC Challenge (HPCC) benchmark suite is designed to assess the bounds of performance on many real-world applications. Included in the benchmark suite are tests for sustained floating point operations, memory bandwidth, rate of random memory updates, interconnect latency, and interconnect bandwidth.
A new version of the benchmark suite, HPCC 1.4.2, is now available. Some changes to the source code include:
- Increased sizes of scratch vectors for local FFT tests to account for runs on systems with large main memory (reported by IBM, SGI, and Intel).
- Reduced vector size for local FFT tests due to larger scratch space needed.
- Added a type cast to prevent overflow of a 32-bit integer vector size in FFT data generation routine (reported by IBM).
- Fixed variable types to handle array sizes that overflow 32-bit integers in RandomAccess (reported by IBM and SGI).
- Changed time-bound code to be used by default in Global RandomAccess and allowed for it to be switched off with a compile time flag if necessary.
- Code cleanup to allow compilation without warnings of RandomAccess test.
- Changed communication code in PTRANS to avoid large message sizes that caused problems in some MPI implementations.
- Updated documentation in README.txt and README.html files.
Visit the HPCC software page to download the tarball.
HPL 2.1 Released
The High Performance LINPACK (HPL) benchmark is a software package that solves a (random) dense linear system in double precision arithmetic on distributed-memory computers. Written in a portable ANSI C and requiring an MPI implementation as well as either the BLAS or VSIPL library, HPL is often one of the first programs to run on large computer installations, producing a result that can be submitted to the TOP500.
A new version of the benchmark, HPL 2.1, is now available. Some changes to the source code include:
- Introduced exact time stamping for HPL_pdgesv():
- [M] dist/include/hpl_misc.h
- [M] dist/testing/ptest/HPL_pdtest.c
- Fixed out-of-bounds access in data spreading functions. Exact time stamping for HPL_pdgesv():
- [M] dist/src/pgesv/HPL_spreadN.c
- [M] dist/src/pgesv/HPL_spreadT.c
Visit the HPL software page to download the tarball.
Interview

Volodymyr Turchenko
Where are you from, originally?
I am from Ternopil, in the western part of Ukraine. Ternopil is the administrative center of the Ternopil region (province), and is located approximately 200 km from the border with Poland. Ternopil is the cleanest and greenest city in Ukraine and has a population of around 200,000 people. I lived there most of my life, except for the 5 years I studied at Brest Polytechnic Institute in the Republic of Belarus, and the 2 years I lived in Italy.
Can you summarize your educational background?
I earned an engineering diploma with honors in System Engineering from Brest Polytechnic Institute, Brest, Belarus in 1995. I say “engineering diploma” because at the time it was still the old Soviet Union educational system, with no Bachelor’s or Master’s degrees like in the US. Since I studied 5 years, this engineering diploma more-or-less corresponds to the US Master’s degree. The funny title of my university specialty (major – it was chosen at the first year of study) was: “Electronic and Computing Machines, Systems, Complexes and Networks.” I received my PhD degree in Computer Engineering from Lviv Polytechnic National University, Lviv, Ukraine in 2001.
Tell us how you first learned about ICL.
First, of course, I learned about Jack. I believe it was in 1994, when, during my study at the institute, we had a course on “High Performance Computers.” I remember I read something about the Message Passing Interface and saw his name. To be honest, we learned HPC only theoretically, because at that time we did not have any HPC equipment, Internet, nor any Windows or Unix operating systems (only Microsoft DOS). I cannot explain how, in these conditions, some documentation about MPI was provided for our study, but it was. Later on, after I joined the Department of Information Computing Systems and Control at my university in Ternopil as a PhD Student, my boss, Prof. Sachenko, had participated several times at the HPC Workshops in Cetraro, Italy (1996, 1998, and 2000) organized by Prof. Lucio Grandinetti, and each time he brought a placard, all of which were placed on a wall in my office, and there again I saw Jack’s name listed.
I met Jack personally at the same HPC Workshop at Cetraro in 2004. By that time, I had already worked towards my postdoctoral research topic for parallelization of neural network training, and everything related to parallelization was very interesting to me. Since then, I realize that ICL is a unique lab where people know exactly how to develop highly efficient parallel algorithms and how to run these algorithms effectively on any kind of parallel architecture. This is very true. It was confirmed by my last short visit to ICL in September 2009, when George Bosilca and Thomas Herault immediately showed me the improvements for my parallel algorithm.
What are your research interests?
My research interests are application of neural networks for solving practical tasks, and parallelization (speedup) of neural network training on high performance computing systems. A new research direction I am now starting to explore is biology-inspired neural network architectures.
What are you working on during your visit with ICL?
My project is focused on the improvement of the parallelization efficiency of the parallel batch and single pattern neural network training algorithms using the enhanced collective communication functions of Open MPI, and by the implementation of these algorithms on a GPU using CUDA. The results deriving from the project fulfillment could be used in the development of a library for parallel neural network training capable of significant speed-up in scientific computations based on neural networks on general-purpose and hybrid HPCs.
What are your interests/hobbies outside work?
Traveling, of course! Who does not enjoy traveling? I also enjoy group sports like soccer and basketball, and I like to play tennis and ping-pong as well. I am very interested in history and read historical books, historical materials on the Internet, and watch historical films. I am also fascinated with space exploration.
Tell us something about yourself that might surprise people.
In 1993, when I studied at the university, I received a semester project in system programming to create a Tetris game which could be brought into space by astronauts on their computers. The main requirements were: the size of the executable file should be less than 2000 bytes and the routine should work in the color (text or graphic) mode of an EGA (very old) monitor. I don’t know if the routine was chosen for the astronauts, but I did it on Assembler, with an executable file size of 1996 bytes.











