by error5772 » Tue Oct 18, 2011 1:46 pm
True 64bit LAPACK on Mr Goto's true 64bit blas!
Here is what I found out:
The following 3 steps lead to a WORKING LAPACK/Goto-BLAS on a 64bit Dual Intel E5645 ("Westmere"->use NEHALEM)
with 2x6 cores and 2x12 threads using matrices with "long" array-indices - I call it "true 64bit".
1. Install true64bit-Goto-BLAS by editing "Makefile.rule":
TARGET=NEHALEM, CC=gcc, FC=gfortran, BINARY=64, USE_THREAD=1, NUM_THREADS=6, INTERFACE64=1
call the lib "libgoto6t.a" and compile some others "libgotoXt.a" as above with NUM_THREADS=X. Copy them to "/usr/local/lib64".
If you use too many threads (e.g. 12) Goto-BLAS will jump out with "sementation fault"! You can use NUM_THREADS=11
and call that BLAS "libgoto11t.a" - it will be your fastest BLAS, but will sometimes jump out (try again).
2. Install true64bit-LAPACK-3.3.1 by editing "make.inc":
FORTRAN=gfortran -m64 -m128bit-long-double -fdefault-integer-8 -fimplicit-none,
OPTS=-O3 -funroll-all-loops, DRVOPTS=$OPTS, NOOPT=-O0,
LOADER=gfortran, LOADOPTS=-L/usr/local/lib64/ -lgoto6t -lpthread
TIMER=INT_ETIME
BLASLIB=/usr/local/lib64/libgoto6t.a -lpthread
Compile it with: make lapack_install, (make variants), make lapacklib, make tmglib
and make lapack_testing (Should work for "libgoto6t.a, "libgoto11t.a" is not good here.)
make blas_testing does not work (-fdefault-integer-8 is a problem for the testers).
Call it "liblapackgoto.a" and copy to "/usr/local/lib64".
3. Have fun with extremely large matrices! You can link the 6 thread-Goto-BLAS with
"-llapackgoto -lgoto6t -lpthread -lgfortran" or your fastest
"-llapackgoto -lgoto11t -lpthread -lgfortran" that sometimes "jumps out of the threads",
but is unbeatable fast!
No need to compile LAPACKE - just call the the Fortran-function two times:
work space query(resize the workspace arrays) - then run the function. Use your own
LAPACK-header in "lapacke.h"-style for C++ (examples):
#include<complex>
extern "C" {
double dlamch_ ( char* );
void dsyev_( char*,char*,long*,double*,long*,double*,double*,long*,long*);
void zheev_( char*,char*,long*,std::complex<double>*,long*,double*,
std::complex<double>*,long*,double*,long* );
}
These 3 functions can be called in main in the form e.g.:
char sw = 'S';
double tol = 2.0*dlamch_(&sw);
(sorry no example for dsyev_ and zheev_)
and even the complex functions work without problems if you give the
pointer to the first element of your complex<double> array of the
Column-Major-Array of your matrix or the workspace-array.
Had to write the report for other people with problems.
Thanks to the forum for the deciding two tips.
Michael