Where are ZSYTRF and ZSYTRI?

ning_an · Post by **ning_an** » Thu Jul 30, 2015 3:38 pm

Dear Developer,

I'm planning to compute the inverse of the double-complex symmetric square matrix. I don't find the MAGMA function that are similar as LAPACK Subroutines zsytrf(...) and zsytri(...). I only found the magma_zpotrf(...) and zpotri(...) functions that works for Hermitian matrix. Are these LAPACK functions, zsytrf(...), and zsytri(...), implemented in the MAGMA? Please help.

Ning

mgates3 · Post by **mgates3** » Tue Aug 04, 2015 2:01 pm

We don't as yet have the complex symmetric version, zsytrf, implemented. Only the Hermitian (zhetrf) and Hermitian positive definite (zpotrf) are currently available. We'll look into providing the symmetric version in future releases.

Also, are you sure you need the inverse? If you are solving a system, Ax = b, or equivalently computing x = A^{-1} b, it is usually both faster and more accurate to compute a factorization and a solve (zsytrf & zsytrs), rather than a factorization, inverse, and multiply (zsytrf & zsytri & zsymm).

-mark

ning_an · Post by **ning_an** » Wed Aug 05, 2015 3:26 pm

Sorry click wrong button. Please see next.

ning_an · Post by **ning_an** » Wed Aug 05, 2015 3:29 pm

Hi, Mark,

Thanks for your reply. Yes, I need to inverse the dense matrix, which is unusual, but the theory requires doing so.
When the size of matrix increases, I got the error as below.

Code: Select all

D:\magma\build\testing\Release>testing_zpotri.exe --lapack
MAGMA 1.6.2  compiled for CUDA capability >= 2.0
CUDA runtime 7000, driver 7050. OpenMP threads 32. MKL 11.0.5, MKL threads 16.
ndevices 2
device 0: GeForce GTX 980, 1215.5 MHz clock, 4096.0 MB memory, capability 5.2
device 1: GeForce GTX 750 Ti, 1254.5 MHz clock, 2048.0 MB memory, capability 5.0

Usage: testing_zpotri.exe [options] [-h|--help]

uplo = Upper
    N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   ||R||_F / ||A||_F
=================================================================
magma_zpotri returned error -113: cannot allocate memory on GPU device.
18072    159.34 ( 148.18)   1293.69 (  18.25)   4.04e-001   failed

My machine configuration is in the below.

Processor: Intel(R) Xeon CPU E5-2687W 0@3.10GHz (2 Processors)
Memory(RAM): 512GB
System Type: 64-bit OS, Windows 8.1Pro
Graphics Card: 2
GeForce GTX 980, 1215.5 MHz clock, 4096.0 MB memory, capability 5.2 (for computing)
GeForce GTX 750 Ti, 1254.5 MHz clock, 2048.0 MB memory, capability 5.0 (for display)
CUDA: CUDA 7.0
MKL: Intel MKL 11.0.5
Compiler: Visual Studio 2013 Community version

I searched through the MAGMA FORUM. I found a thread of "viewtopic.php?f=2&t=1042", in which YOU said

magma_cgesv tries to be smart about memory. If you use one GPU and the matrix fits on one GPU, it uses magma_cgetrf_gpu and magma_cgetrs_gpu (essentially magma_cgesv_gpu). If you request multiple GPUs or the matrix does not fit on one GPU, it uses the multi-GPU, out-of-GPU-core magma_cgetrf. This distributes the matrix across the GPUs, and can cycle portions of the matrix through the GPUs if it doesn't fit in GPU memory.

I have three question to ask for help.
Q1: I found the "testing_zpotri" does not use the advantage of symmetric property to reduce the memory usage. It is okay for small size matrix, but it consumes big portion of the memory for large size matrix. Do you have plan to develop a pair of functions as ZSPTRF + ZSPTRI in LAPACK (symmetric indefinite, packed storage)?

Q2: From the error message, it shows there is not enough memory on the GPU [4983.5MB(needed)>4096.0MB(installed)]. Are magma_zpotrf/magma_zpotri function smart to manage the memory? If they are not, are the functions zgetrf/zgetri smart to manage the memory?

Q3: If "zgetrf/zgetri" are not smart to manage the memory, please give me a example to show how to make code that can manage memory smart as "magma_cgesv".

Thanks.

Ning

mgates3 · Post by **mgates3** » Wed Aug 05, 2015 3:46 pm

We do not currently have plans to make a packed version, due to the poor performance with that memory access pattern. Possibly we could use rectangular full packed storage, but there are no current plans for that. See www.netlib.org/lapack/lawnspdf/lawn199.pdf.

magma_zpotrf and magma_zgetrf are smart about running out of GPU memory. However, the inverse routines magma_zpotri and magma_zgetri require the entire matrix to fit in the GPU memory.

After using MAGMA to factor a matrix, you can use LAPACK's zpotri and zgetri to invert the matrix on the CPU.

-mark

ning_an · Post by **ning_an** » Wed Aug 05, 2015 3:49 pm

Thanks Mark,

I will try what you suggested, and report the results.

Have a great day.

Ning

ning_an · Post by **ning_an** » Wed Aug 05, 2015 4:23 pm

Hi, Mark,

I just comment the function of "magma_zpotri( opts.uplo, N, h_R, lda, &info );", and replace by " lapackf77_zpotri(lapack_uplo_const(opts.uplo), &N, h_R, &lda, &info);" , Which looks like in the below.
Old

Code: Select all

			gpu_time = magma_wtime();
			/* factorize matrix */
            magma_zpotrf( opts.uplo, N, h_R, lda, &info );
			if (info != 0)
				printf("magma_zpotrf returned error %d: %s.\n",
				(int)info, magma_strerror(info));

             magma_zpotri( opts.uplo, N, h_R, lda, &info );
            gpu_time = magma_wtime() - gpu_time;
            gpu_perf = gflops / gpu_time;
            if (info != 0)
                printf("magma_zpotri returned error %d: %s.\n",
                       (int) info, magma_strerror( info ));

Change to

Code: Select all

			gpu_time = magma_wtime();
			/* factorize matrix */
            magma_zpotrf( opts.uplo, N, h_R, lda, &info );
			if (info != 0)
				printf("magma_zpotrf returned error %d: %s.\n",
				(int)info, magma_strerror(info));

            // magma_zpotri( opts.uplo, N, h_R, lda, &info );
			lapackf77_zpotri(lapack_uplo_const(opts.uplo), &N, h_R, &lda, &info);
            gpu_time = magma_wtime() - gpu_time;
            gpu_perf = gflops / gpu_time;
            if (info != 0)
                printf("magma_zpotri returned error %d: %s.\n",
                       (int) info, magma_strerror( info ));

But the result is not okay, as shown in the below.

Code: Select all

D:\magma\build\testing\Release>testing_zpotri.exe --lapack
MAGMA 1.6.2  compiled for CUDA capability >= 2.0
CUDA runtime 7000, driver 7050. OpenMP threads 32. MKL 11.0.5, MKL threads 16.
ndevices 2
device 0: GeForce GTX 980, 1215.5 MHz clock, 4096.0 MB memory, capability 5.2
device 1: GeForce GTX 750 Ti, 1254.5 MHz clock, 2048.0 MB memory, capability 5.0

Usage: testing_zpotri.exe [options] [-h|--help]

    N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   ||R||_F / ||A||_F
=================================================================
18072    151.66 ( 155.68)    179.80 ( 131.32)   3.55e+169   failed

I'm new to use MAGMA. Should I do something on the factorized matrix (h_R), before calling lapackf77_zpotri(...) .
I think some how function lapackf77_zpotri(...) does not accept the the factorized matrix (h_R) from magma_zpotrf(...) directly.

any suggestion is welcome.

Ning

MAGMA Forum

Where are ZSYTRF and ZSYTRI?

Where are ZSYTRF and ZSYTRI?

Re: Where are ZSYTRF and ZSYTRI?

Re: Where are ZSYTRF and ZSYTRI?

Re: Where are ZSYTRF and ZSYTRI?

Re: Where are ZSYTRF and ZSYTRI?

Re: Where are ZSYTRF and ZSYTRI?

Re: Where are ZSYTRF and ZSYTRI?