ICL Newsletter

News and Announcements

Supercomputing ’12

This year’s Supercomputing Conference will be held on November 10th – 16th at the Salt Palace Convention Center in Salt Lake City, UT. As usual, we expect to have a considerable presence at the conference with BoFs, posters, and talks. Additionally, the University of Tennessee will have its own booth this year where ICL’s research will be featured alongside other UT research centers. Below is a schedule of ICL related activities. For an entire list of activities, visit the SC12 schedule page.

Sunday, 11^th	3rd Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Workshop, Room 255-B, 9:00am – 5:30pm
	An overview of fault-tolerant techniques for HPC, Tutorial, Room 355-F, 1:30pm – 5:00pm
Monday, 12^th	Linear Algebra Libraries for High-Performance Computing: Scientific Computing with Multicore and Accelerators, Tutorial, Room 355-D, 8:30am – 5:00pm
Tuesday, 13^th	The 2012 HPC Challenge Awards, BoF, Room 355-A, 12:15pm – 1:15pm
	MAGMA: A New Generation of Linear Algebra Libraries for GPU and Multicore Architectures, NVIDIA Booth (2217), 2:00pm – 2:30pm
	Matrices Over Runtime Systems at Exascale, Poster, East Entrance, 5:15pm – 7:00pm
	Acceleration of the BLAST Hydro Code on GPU, Tutorial, East Entrance, 5:15pm – 7:00pm
	A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks, Poster, East Entrance, 5:15pm – 7:00pm
	TOP500 Supercomputers, BoF, Ballroom-EFGH, 5:30pm – 7:00pm
Wednesday, 14^th	Open MPI State of the Union, BoF, Room 155-B, 12:15pm – 1:15pm
	Power and Energy Measurement and Modeling on the Path to Exascale, BoF, Room 255-EF, 5:30pm – 7:00pm
	ICL Alumni Dinner, Faustina, 454 East Broadway, Salt Lake City, UT., 7:00pm
Friday, 16^th	Extreme-Scale Performance Tools, Workshop, Room 155-F, 8:30am – 12:30pm

ORNL’s Titan is Online

ORNL’s Jaguar has finally given way to Titan, a new Cray XK7 supercomputer that uses both AMD Opteron CPUs and NVIDIA K20 GPUs in hopes of achieving 20+ petaflop/s of performance. The machine itself is made up of 18,688 nodes, each of which contains a single 16-core AMD Opteron CPU and NVIDIA K20 GPU. According to ORNL, 90% of the machine’s theoretical performance comes from the GPU. If Titan’s performance goal is achieved, it will likely be the fastest supercomputer on the planet and the next #1 on the TOP500.

NPR’s All Things Considered recently caught up with Buddy Bland, director of ORNL’s Leadership Computing Facility, to discuss the new hybrid supercomputer. Click on the play button below to listen to the interview, or follow this link to NPR’s website.

[audio:http://icl.cs.utk.edu/newsletter/files/2012-11/Buddy_Bland_Titan_NPR_All_Things_Considered.mp3]

Conference Reports

ICMS 2012

On October 9th – 11th, Jack Dongarra attended the International Computational Mechanics Symposium in Kobe, Japan, where he gave an invited talk. ICMS 2012 covered many topics in computational mechanics, but this year’s symposium focused on the ever-increasing speed and scale of high performance computers. Jack was one of five plenary speakers, invited from all over the world, to address the state-of-the-art developments in high performance computing and computational mechanics.

Jack’s talk, Algorithmic and Software Challenges when Moving Towards Exascale, focused on how high performance computing has changed over the last 10-years, and what the future may look like based on trends. These changes have had and will continue to have a major impact on software development for high performance computing, including the need for software redesign to fit multicore and hybrid architectures, automatically tuned application software, fault tolerance, communication avoiding algorithms, and exploiting mixed precision for performance.

CASC Fall 2012 Meeting

ICL’s Terry Moore recently attended the Fall 2012 meeting of the Coalition for Academic Scientific Computation (CASC) in Alexandria, VA., where he gave a talk about UTK’s Interdisciplinary Graduate Minor in Computational Science (IGMCS). The CASC is an organization that represents many of the nation’s most forward thinking universities and computing centers as an advocate for advanced computing technology aimed at accelerating scientific discovery and enhancing national competitiveness, global security, and economic success.

Since workforce development is becoming an increasingly critical issue for this community, this meeting featured talks and a panel discussion (in which Terry also participated) about how different universities are moving to build up their Computational Science education programs. Many friends of ICL (e.g., several people from NICS) were at the meeting, as one would expect in light of the makeup of the CASC community.

Recent Releases

clMAGMA 1.0 Released

Matrix Algebra on GPU and Multicore Architectures (MAGMA) is a collection of next generation linear algebra (LA) libraries for hybrid architectures. The MAGMA package supports interfaces for current LA libraries and standards, e.g., LAPACK and BLAS, to allow computational scientists to easily port any LA-reliant software components to heterogeneous systems. clMAGMA is an OpenCL port of the MAGMA library.

clMAGMA 1.0 is now available and includes the following new functionalities:

Eigen and singular value problem solvers in both real and complex arithmetic, single and double (routines magma_z|c}heevd, magma_{d|s}syevd, magma_{z|c|d|s}geev, and magma_{z|c|d|s}gesvd);
Matrix inversion routines (routines magma_{z|c|d|s}trtri_gpu, magma_{z|c|d|s}getri_gpu, magma_{z|c|d|s}potri_gpu);
Orthogonal transformations routines ({z|c}unmqr_gpu, {d|s}ormqr_gpu, {z|c}ungqr, {d|s}orgqr, {z|c}unmtr, {d|s}ormtr, {z|c}unmqr, {d|s}ormqr, {z|c}unmql, {d|s}ormql, {z|c}unghr, and {d|s}orghr).

Visit the MAGMA software page to download the tarball.

HPCC 1.4.2 Released

The HPC Challenge (HPCC) benchmark suite is designed to assess the bounds of performance on many real-world applications. Included in the benchmark suite are tests for sustained floating point operations, memory bandwidth, rate of random memory updates, interconnect latency, and interconnect bandwidth.

A new version of the benchmark suite, HPCC 1.4.2, is now available. Some changes to the source code include:

Increased sizes of scratch vectors for local FFT tests to account for runs on systems with large main memory (reported by IBM, SGI, and Intel).
Reduced vector size for local FFT tests due to larger scratch space needed.
Added a type cast to prevent overflow of a 32-bit integer vector size in FFT data generation routine (reported by IBM).
Fixed variable types to handle array sizes that overflow 32-bit integers in RandomAccess (reported by IBM and SGI).
Changed time-bound code to be used by default in Global RandomAccess and allowed for it to be switched off with a compile time flag if necessary.
Code cleanup to allow compilation without warnings of RandomAccess test.
Changed communication code in PTRANS to avoid large message sizes that caused problems in some MPI implementations.
Updated documentation in README.txt and README.html files.

Visit the HPCC software page to download the tarball.

HPL 2.1 Released

The High Performance LINPACK (HPL) benchmark is a software package that solves a (random) dense linear system in double precision arithmetic on distributed-memory computers. Written in a portable ANSI C and requiring an MPI implementation as well as either the BLAS or VSIPL library, HPL is often one of the first programs to run on large computer installations, producing a result that can be submitted to the TOP500.

A new version of the benchmark, HPL 2.1, is now available. Some changes to the source code include:

Introduced exact time stamping for HPL_pdgesv():
- [M] dist/include/hpl_misc.h
- [M] dist/testing/ptest/HPL_pdtest.c
Fixed out-of-bounds access in data spreading functions. Exact time stamping for HPL_pdgesv():
- [M] dist/src/pgesv/HPL_spreadN.c
- [M] dist/src/pgesv/HPL_spreadT.c

Visit the HPL software page to download the tarball.

Interview

Volodymyr Turchenko Then — Volodymyr Turchenko

Where are you from, originally?

I am from Ternopil, in the western part of Ukraine. Ternopil is the administrative center of the Ternopil region (province), and is located approximately 200 km from the border with Poland. Ternopil is the cleanest and greenest city in Ukraine and has a population of around 200,000 people. I lived there most of my life, except for the 5 years I studied at Brest Polytechnic Institute in the Republic of Belarus, and the 2 years I lived in Italy.

Can you summarize your educational background?

I earned an engineering diploma with honors in System Engineering from Brest Polytechnic Institute, Brest, Belarus in 1995. I say “engineering diploma” because at the time it was still the old Soviet Union educational system, with no Bachelor’s or Master’s degrees like in the US. Since I studied 5 years, this engineering diploma more-or-less corresponds to the US Master’s degree. The funny title of my university specialty (major – it was chosen at the first year of study) was: “Electronic and Computing Machines, Systems, Complexes and Networks.” I received my PhD degree in Computer Engineering from Lviv Polytechnic National University, Lviv, Ukraine in 2001.

Tell us how you first learned about ICL.

First, of course, I learned about Jack. I believe it was in 1994, when, during my study at the institute, we had a course on “High Performance Computers.” I remember I read something about the Message Passing Interface and saw his name. To be honest, we learned HPC only theoretically, because at that time we did not have any HPC equipment, Internet, nor any Windows or Unix operating systems (only Microsoft DOS). I cannot explain how, in these conditions, some documentation about MPI was provided for our study, but it was. Later on, after I joined the Department of Information Computing Systems and Control at my university in Ternopil as a PhD Student, my boss, Prof. Sachenko, had participated several times at the HPC Workshops in Cetraro, Italy (1996, 1998, and 2000) organized by Prof. Lucio Grandinetti, and each time he brought a placard, all of which were placed on a wall in my office, and there again I saw Jack’s name listed.

I met Jack personally at the same HPC Workshop at Cetraro in 2004. By that time, I had already worked towards my postdoctoral research topic for parallelization of neural network training, and everything related to parallelization was very interesting to me. Since then, I realize that ICL is a unique lab where people know exactly how to develop highly efficient parallel algorithms and how to run these algorithms effectively on any kind of parallel architecture. This is very true. It was confirmed by my last short visit to ICL in September 2009, when George Bosilca and Thomas Herault immediately showed me the improvements for my parallel algorithm.

What are your research interests?

My research interests are application of neural networks for solving practical tasks, and parallelization (speedup) of neural network training on high performance computing systems. A new research direction I am now starting to explore is biology-inspired neural network architectures.

What are you working on during your visit with ICL?

My project is focused on the improvement of the parallelization efficiency of the parallel batch and single pattern neural network training algorithms using the enhanced collective communication functions of Open MPI, and by the implementation of these algorithms on a GPU using CUDA. The results deriving from the project fulfillment could be used in the development of a library for parallel neural network training capable of significant speed-up in scientific computations based on neural networks on general-purpose and hybrid HPCs.

What are your interests/hobbies outside work?

Traveling, of course! Who does not enjoy traveling? I also enjoy group sports like soccer and basketball, and I like to play tennis and ping-pong as well. I am very interested in history and read historical books, historical materials on the Internet, and watch historical films. I am also fascinated with space exploration.

Tell us something about yourself that might surprise people.

In 1993, when I studied at the university, I received a semester project in system programming to create a Tetris game which could be brought into space by astronauts on their computers. The main requirements were: the size of the executable file should be less than 2000 bytes and the routine should work in the color (text or graphic) mode of an EGA (very old) monitor. I don’t know if the routine was chosen for the astronauts, but I did it on Assembler, with an executable file size of 1996 bytes.

Recent Papers

Donfack, S., S. Tomov, and J. Dongarra, “Performance evaluation of LU factorization through hardware counter measurements,” University of Tennessee Computer Science Technical Report, no. ut-cs-12-700, October 2012. (794.82 KB)
SolcÃ , R., A. Haidar, S. Tomov, J. Dongarra, and T. C. Schulthess, “A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks,” Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.
Dong, T., T. Kolev, R. Rieben, V. Dobrev, S. Tomov, and J. Dongarra, “Acceleration of the BLAST Hydro Code on GPU,” Supercomputing '12 (poster), Salt Lake City, Utah, SC12, November 2012.
Kurzak, J., S. Tomov, and J. Dongarra, “Autotuning GEMM Kernels for the Fermi GPU,” IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 11, November 2012. DOI: 10.1109/TPDS.2011.311 (742.5 KB)
Dongarra, J., H. Ltaeif, P. Luszczek, and V. M. Weaver, “Energy Footprint of Advanced Dense Numerical Linear Algebra using Tile Algorithms on Multicore Architecture,” The 2nd International Conference on Cloud and Green Computing (submitted), Xiangtan, Hunan, China, November 2012. (329.5 KB)
Dongarra, J., M. Gates, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov, MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), November 2012. (6.4 MB)
Dongarra, J., T. Dong, M. Gates, A. Haidar, S. Tomov, and I. Yamazaki, MAGMA: A New Generation of Linear Algebra Library for GPU and Multicore Architectures , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), Presentation, November 2012. (4.69 MB)
Agullo, E., G. Bosilca, C. CastagnÃ¨de, J. Dongarra, H. Ltaeif, and S. Tomov, “Matrices Over Runtime Systems at Exascale,” Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.

Recent Lunch Talks

OCT
5
Piotr Luszczek
Anatomy of a Globally Recursive Embedded LINPACK Benchmark PDF
OCT
12
Yves Robert
Impact of fault prediction on checkpointing strategies PDF
OCT
19
Nicholas Nagle
Department of Geography
Machine learning perspectives for Sample Surveys and Small Area Estimation PDF
OCT
26
Gabriel Marin
ORNL
How fast should my application run? PDF
NOV
2
Mitch Horton
GATech
Domain Science at Scale on Keeneland PDF
NOV
9
George Bosilca
Communication patterns and their integration at different levels of the software stack PDF
NOV
30
Azzam Haidar
MAGMA: toward fast Eigensolver PDF

Upcoming Lunch Talks

DEC
7
Ichitaro Yamazaki
Using sparse and dense direct solvers to solve coupled linear systems
DEC
14
Peng Du, Matt Johnson, Teng Ma, Asim YarKhan
ICL Graduates

People

George Bosilca is back in the lab for the next couple of weeks and will be attending SC12 with his fellow ICLers.

Dates to Remember

Thanksgiving Holidays

Just as a reminder, the UT campus will be closed on November 22 and 23rd to observe the Thanksgiving holiday. ICL’s Thanksgiving lunch will be served on Wednesday, November 21st in Claxton 233.

SC12 Alumni Dinner

The SC12 ICL Alumni Dinner will be on Wednesday November, 14th at 7:00pm at Faustina, 454 East Broadway, Salt Lake City, UT. Please RSVP to Tracy Rafferty by Monday, November 12th.

November 2012