I was using magma 1.0 rc5 for a couple of month.
now i 've switched to the final release, i got "!!!! device memory allocation error (magma_zhetrd) " as soon as i call magmaf_zheevd,
(i checked, i call cublasInit properly before and i use very small matrix sizes)
(the testing also fail for:
testing_zheevd ==> !!!! device memory allocation error (magma_zhetrd)
testing_zheevd_gpu ==> !!!! device memory allocation error (magma_zheevd_gpu)
testing_zhegvd ==> !!!! device memory allocation error (magma_zheevd_gpu)
but works for magma_zhetrd, zhetrd_gpu
And it seems that on my Tesla T10 based cluster, zpotrf_gpu doesn't work either (the only time i tested it with rc5, i got incorrect results)
magma-1.0_cuda3.2/testing/testing_zpotrf_gpu
device 0: Tesla T10 Processor, 1440.0 MHz clock, 4095.8 MB memory
device 1: Tesla T10 Processor, 1440.0 MHz clock, 4095.8 MB memory
Usage:
testing_zpotrf_gpu -N 1024
N CPU GFlop/s GPU GFlop/s ||R||_F / ||A||_F
========================================================
Argument 6 of magma_zpotrf had an illegal value.
1024 9.16 159422.58 2.685846e+01
!!!! device memory allocation error
Re: !!!! device memory allocation error
I've partially solve the problem.
I think these allocation error are related to a known bug between icc and cublas:
cuda 4 release notes say:
Since i build magma with icc and mkl (to get highest hybrid performance), it's not surprising i get errors.
The error i mentionned in the first topic occurs in every double complex routines with cuda 3.2, but only on zgemm with cuda4.
does anyone encounter similar issue and found a workaround?
I think these allocation error are related to a known bug between icc and cublas:
cuda 4 release notes say:
Do magma developpers were aware of this bug?* (Linux) There is a known bug in ICC with respect to passing 16-byte aligned types by value to GCC-built code such as the CUDA Toolkit libraries e.g. CUBLAS. At this time, passing a double2 or cuDoubleComplex or any other 16-byte aligned type by value to GCC-built code from ICC-built code will pass incorrect data. Intel has been informed of this bug. As a workaround, a GCC-built wrapper function that accepts the data by reference from the ICC-built code can be linked with the ICC-built code; the GCC-built wrapper can then, in turn, pass the data by value to the CUDA Toolkit libraries.
Since i build magma with icc and mkl (to get highest hybrid performance), it's not surprising i get errors.
The error i mentionned in the first topic occurs in every double complex routines with cuda 3.2, but only on zgemm with cuda4.
does anyone encounter similar issue and found a workaround?