LAPACK/ScaLAPACK Development

Posted: **Wed Apr 06, 2005 4:48 pm**

Hello,

I am trying to compile lapack using ifort, but am having difficulty.

Depending on the OPTS and NOOPT settings in make.inc, the compile either hangs in an infinite loop during the testing routines, or it causes an error during the timing process.

I posted to comp.lang.fortran and they referred me here.

cell@block$ uname -a
Linux 2.4.25-13mdk #1 Tue Jan 18 15:37:15 MST 2005 i686 unknown unknown GNU/Linux

cell@block$ cat /proc/cpuinfo | grep "model name"
model name : AMD Athlon(TM) XP 2200+

cell@block$ ifort -V
Intel(R) Fortran Compiler for 32-bit applications, Version 8.1 Build 20040803 Z Package ID: l_fc_p_8.1.018

I made the following modification to LAPACK/Makefile:

cell@block$ diff Makefile Makefile.orig
11,12c11,12
< #lib: lapacklib tmglib
< lib: blaslib lapacklib tmglib
---
> lib: lapacklib tmglib
> #lib: blaslib lapacklib tmglib

I am using the make.inc.LINUX file, with FORTRAN = ifort, and have tried variations on the OPTS and NOOPT flags like so:

# from http://scipy.net/cgi-bin/viewcvsx.cgi/s ... t?rev=1.18
# causes inifinite loop Testing DOUBLE PRECISION LAPACK linear equation routines
OPTS=-Vaxlib -O3 -unroll -mp1 -static-libcxa
NOOPT=-mp -O0 -fltconsistency -fp_port

# from http://www.iup.physik.uni-bremen.de/sciatran/make.inc
# causes infinite loop in Testing Symmetric Eigenvalue Problem routines
OPTS = -O0
NOOPT = -O0

as well as my own:

OPTS = -O0 -mp -unroll0
NOOPT = -O0 -mp -unroll0

Has anyone else gotten this to work?

thanks.

Posted: **Thu Apr 07, 2005 3:34 pm**

We are at the same point as you. There are problems to pass the test suite of LAPACK3.0 with Intel Fortran compiler ifort 8.1 and we do not have found yet a way to pass the test (even in -O0 -mp -unroll0 has you mentionned).

There is some ways to go around. For example use LAPACK3E, the installation is ok with ifort/icc + (OPT = -O3, NOOPT = ) and it passes its own test (they are close to the one of LAPACK3.0) sucessfully. Another way to go around is to use MKL, it passes the LAPACK test suite as well.

Finally, not that even though the tests have not succesfully been passed, it is reasonnable to hope that the LAPACK library created with ' ifort -O3 -mp1 ' will be ok for your application.

We are working in improving the situation for LAPACK3,0, if anybody has a hint, he is welcome to post it.

Julien

Posted: **Thu Apr 07, 2005 4:21 pm**

I tried compiling lapack3e, but am getting lots of undefined references, like the following:

/workspace/LAPACK3E/liblapack3e.a(clals0.o)(.text+0x13b3): In function `clals0_' :
: undefined reference to `ccopy_'
make[1]: *** [../xlintsts] Error 1
make[1]: Leaving directory `/workspace/LAPACK3E/TESTING/LIN'

Posted: **Thu Apr 07, 2005 5:28 pm**

it happens during this part of the install procedure:

cd $(LAPACKPATH)/TESTING/LIN; make

Posted: **Thu Apr 07, 2005 6:53 pm**

This is normal you should work a bit on BLAS/SRC/Makefile in your case.

By default LAPACK3E just go in the BLAS/SRC directory to get the xNRM2 files, if you want to get the whole BLAS from LAPACK3E (your case I guess), you will need to have those following lines in BLAS/SRC/Makefile

Code: Select all: SBLAS1 = isamax.o sasum.o saxpy.o scopy.o sdot.o snrm2.o \ srot.o srotg.o sscal.o sswap.o srotm.o #SBLAS1 = snrm2.o $(SBLAS1): $(FRC) CBLAS1 = scasum.o scnrm2.o icamax.o caxpy.o ccopy.o \ cdotc.o cdotu.o csscal.o crotg.o cscal.o cswap.o #CBLAS1 = scnrm2.o $(CBLAS1): $(FRC) DBLAS1 = idamax.o dasum.o daxpy.o dcopy.o ddot.o dnrm2.o \ drot.o drotg.o dscal.o dswap.o drotm.o #DBLAS1 = dnrm2.o $(DBLAS1): $(FRC) ZBLAS1 = dcabs1.o dzasum.o dznrm2.o izamax.o zaxpy.o zcopy.o \ zdotc.o zdotu.o zdscal.o zrotg.o zscal.o zswap.o #ZBLAS1 = dznrm2.o $(ZBLAS1): $(FRC) CB1AUX = isamax.o sasum.o saxpy.o scopy.o snrm2.o sscal.o #CB1AUX = snrm2.o $(CB1AUX): $(FRC) ZB1AUX = idamax.o dasum.o daxpy.o dcopy.o dnrm2.o dscal.o #ZB1AUX = dnrm2.o $(ZB1AUX): $(FRC) #--------------------------------------------------------------------- # The following line defines auxiliary routines needed by both the # Level 2 and Level 3 BLAS. Comment it out only if you already have # both the Level 2 and 3 BLAS. #--------------------------------------------------------------------- ALLBLAS = lsame.o xerbla.o # ALLBLAS = $(ALLBLAS) : $(FRC) #--------------------------------------------------------- # Comment out the next 4 definitions if you already have # the Level 2 BLAS. #--------------------------------------------------------- SBLAS2 = sgemv.o sgbmv.o ssymv.o ssbmv.o sspmv.o \ strmv.o stbmv.o stpmv.o strsv.o stbsv.o stpsv.o \ sger.o ssyr.o sspr.o ssyr2.o sspr2.o # SBLAS2 = $(SBLAS2): $(FRC) # CBLAS2 = cgemv.o cgbmv.o chemv.o chbmv.o chpmv.o \ ctrmv.o ctbmv.o ctpmv.o ctrsv.o ctbsv.o ctpsv.o \ cgerc.o cgeru.o cher.o chpr.o cher2.o chpr2.o # CBLAS2 = $(CBLAS2): $(FRC) # DBLAS2 = dgemv.o dgbmv.o dsymv.o dsbmv.o dspmv.o \ dtrmv.o dtbmv.o dtpmv.o dtrsv.o dtbsv.o dtpsv.o \ dger.o dsyr.o dspr.o dsyr2.o dspr2.o $(DBLAS2): $(FRC) # ZBLAS2 = zgemv.o zgbmv.o zhemv.o zhbmv.o zhpmv.o \ ztrmv.o ztbmv.o ztpmv.o ztrsv.o ztbsv.o ztpsv.o \ zgerc.o zgeru.o zher.o zhpr.o zher2.o zhpr2.o $(ZBLAS2): $(FRC) # #--------------------------------------------------------- # Comment out the next 4 definitions if you already have # the Level 3 BLAS. #--------------------------------------------------------- SBLAS3 = sgemm.o ssymm.o ssyrk.o ssyr2k.o strmm.o strsm.o # SBLAS3 = $(SBLAS3): $(FRC) # CBLAS3 = cgemm.o csymm.o csyrk.o csyr2k.o ctrmm.o ctrsm.o \ chemm.o cherk.o cher2k.o # CBLAS3 = $(CBLAS3): $(FRC) # DBLAS3 = dgemm.o dsymm.o dsyrk.o dsyr2k.o dtrmm.o dtrsm.o $(DBLAS3): $(FRC) # ZBLAS3 = zgemm.o zsymm.o zsyrk.o zsyr2k.o ztrmm.o ztrsm.o \ zhemm.o zherk.o zher2k.o $(ZBLAS3): $(FRC) #

( Note that I am not sure LAPACK3E will exactly answer the problem, it will pass its own test, that's all)

Posted: **Thu Apr 21, 2005 11:26 am**

With the Intel compiler version 8.1.02[8-9] and earlier versions as well, I found that I had to make a number of alterations to certain lapack routines before the test suites would run to conclusion.

I will only mention a few of the routines for the moment: slasd4 and dlasd4 (had to change the convergence criterion, and always return info=0, whether converged or not), and the scaling routine slascl (and its cousins). Without the changes I made, the routines would sometimes go into an infinite loop. The test suites are a pretty good stress test, and some of the problems presented would caused the infinite loops to occur.

The info=1 return from slasd4 and dlasd4 caused an infinite loop until I eliminated it, and I also found that I had to increase the maximum number of iterations allowed in order to get good performance.

The scaling routines get into trouble when one of the argument comes through as Infinity, because they never converge. I fixed this by adding an interation count, and having it max out. I use -fpe:3 so that NaN's and Infinity can both appear as legitimate values.

However, there are still other routines that I have not yet fixed, which have potential infinite loops and no iteration count, depending on some convergence criterion in order to get out of the loop.

In my humble opinion, distributing code like this is dangerous, since there can always be some unanticipated situation with a process/compiler that stops the criterion from being met. But that's the way lapack version 3 is distributed, and a value of Infinity causes the trouble.

I have also found serious differences in performance between an AMD processor and an Intel Pentium 4 processor, especially with code optimized for the Pentium 4. The differences are puzzling, and I cannot tell if it is the processor, the compiled code, or some combination of the two which is causing the difficulties.

I hope this helps.

Posted: **Mon Oct 10, 2005 5:57 pm**

11/9/2005 The problem was identified by Intel as an effect from floating point precision. In order to guarantee IEEE behavior, you have to use -Op! And the lapack library is built with that floating point arithmetic in mind. So, ifort.cfg (or the command line) needs to have

-fltconsistency -Qfl_port -Op

Otherwise, the library will not work correctly

10/10/2005 I reported to Intel an optimizer bug when running a test of sgelsd to solve a particular 30x30 set of equations. The results were all NaN's (I use -fpe:3 as my standard option).

However, the problem went away, and I got the correct solution, when I used the following additional compiler options:

-fltconsistency -Qprec -Qfl_port

which are not the defaults (in my opinion, they ought to be). I did not think of these; they were suggested by Sven Hammarling of the NAG.

As a result of this, I strongly recommend updating the ifort.cfg file to include those options, along with -O3 which can always be overridden by including -Od or some other optimization choice on the command line when ifort is invoked.

Posted: **Thu Dec 21, 2006 2:04 pm**

Hi guys,

I have been trying to install lapack3e for ifort 8.1, appearantly in vain till now. Your posts might turn out to be very useful for my case as well.

I am using ifort 8.1 - 9.0 (32 bit) on a 64 bit 3.2 GHz P4 linux box. The man pages for ifort don't show any flag like -Qfl_port, -Qprec or -Op. Are you using any special version of ifort?

Would appreciate your reply.

Regards,
Khosrow

Thumbtack wrote:11/9/2005 The problem was identified by Intel as an effect from floating point precision. In order to guarantee IEEE behavior, you have to use -Op! And the lapack library is built with that floating point arithmetic in mind. So, ifort.cfg (or the command line) needs to have

-fltconsistency -Qfl_port -Op

Otherwise, the library will not work correctly

10/10/2005 I reported to Intel an optimizer bug when running a test of sgelsd to solve a particular 30x30 set of equations. The results were all NaN's (I use -fpe:3 as my standard option).

However, the problem went away, and I got the correct solution, when I used the following additional compiler options:

-fltconsistency -Qprec -Qfl_port

which are not the defaults (in my opinion, they ought to be). I did not think of these; they were suggested by Sven Hammarling of the NAG.

As a result of this, I strongly recommend updating the ifort.cfg file to include those options, along with -O3 which can always be overridden by including -Od or some other optimization choice on the command line when ifort is invoked.

Posted: **Fri Dec 22, 2006 4:03 pm**

Hello Khosrow,
a priori you will find very few support for LAPACK3E on this forum. Hope somebody can help you out. The 'updated' correct flags for LAPACK3.1 and the Intel Fortran compiler ifort are given by Julie at http://icl.cs.utk.edu/lapack-forum/viewtopic.php?t=295. Good luck with LAPACK3E.
Julien

Posted: **Sat Dec 23, 2006 10:27 am**

If you use ifort /?, a complete list (or almost complete; I'm not sure) of the optioins should appear, and the ones I have indicated should be there.

The version I am currently using is 9.1.032 for windows xp, but the switches I've indicated should work on back versions. It is the professional version (originally came with the IMSL libraries), but the standard version should be the same. As far as I know, the standard version is the same compiler; it just has some extras left out which come with the professional version.

It is possible that they have changed the options, or the appearance of the options, for the linux version, but it should be the same compiler if it runs on an Intel chip.

Whatever the options, make sure that you are compliant with the IEEE standard (no Intel shortcuts to save time). I had more than one routine (not just lapack routines) that wouldn't work properly until I set those switches to enforce the IEEE standard.

LAPACK/ScaLAPACK Development

compiling LAPACK with the intel fortran compiler (ifort)

compiling LAPACK with the intel fortran compiler (ifort)

Intel Fortran and Lapack

Importance of Compiler Switches

Re: Importance of Compiler Switches

ifort options