DSYSV and ZHESV stability/performance

Open discussion for MAGMA library (Matrix Algebra on GPU and Multicore Architectures)
Post Reply
mcalderara
Posts: 4
Joined: Mon Jan 26, 2015 10:20 am

DSYSV and ZHESV stability/performance

Post by mcalderara » Mon Mar 02, 2015 12:13 pm

Hi everyone,

I'd like to use the symmetric/hermitian solvers DSYSV/ZHESV for some medium sized problems, that is matrices ranging from 3000 to 10000 rows/columns with a similar number of right hand sides to solve for. There are a few questions that arose while testing these routines using the supplied testing codes in testing/.

The pivoting variants of these routines seem to run much slower than the corresponding general solving routines (~10GFlops for ZHESV vs ~500GFlops for ZGESV) while the non-pivoting variants do significantly better. These non-pivoting routines seem to be reasonably accurate for one RHS [1] even at the targeted system size but show an increase of the relative error of 13 orders of magnitude when adding a second RHS [2]. Given that there is no pivoting one probably shouldn't be surprised that there are stability issues for larger systems but why does solving become so much worse with one more RHS and then stay constant when adding more RHS after that? Are there any estimates on the requirements for system sizes or condition numbers for using these non-pivoting methods? Also the reported performance is beyond the theoretical peak of the device used ...

I looked at the code to see what's going on but didn't yet find out where the problem is other than that the reported performance for non-pivoted ZHETRF is computed the wrong way. I'm going compare its output to a reference implementation to see if the error is there or in ZHETRS (or a subroutine of it).

Thanks for any pointers!

Best,
mauro


[1]:
MAGMA 1.6.1 compiled for CUDA capability >= 3.0
CUDA runtime 5050, driver 5050. OpenMP threads 1. MKL 11.1.2, MKL threads 1.
ndevices 1
device 0: Tesla K20X, 732.0 MHz clock, 5759.6 MB memory, capability 3.5
Usage: ./testing_zhesv_nopiv_gpu [options] [-h|--help]

N NRHS CPU GFlop/s (sec) GPU GFlop/s (sec) ||B - AX|| / N*||A||*||X||
================================================================================
1088 1 --- ( --- ) 128.86 ( 0.03) 1.96e-18 ok
2112 1 --- ( --- ) 436.16 ( 0.06) 1.60e-18 ok
3136 1 --- ( --- ) 797.69 ( 0.10) 1.35e-18 ok
4160 1 --- ( --- ) 1087.38 ( 0.18) 1.15e-18 ok
5184 1 --- ( --- ) 1299.12 ( 0.29) 1.25e-18 ok
6208 1 --- ( --- ) 1464.39 ( 0.44) 1.15e-18 ok
7232 1 --- ( --- ) 1583.17 ( 0.64) 1.03e-18 ok
8256 1 --- ( --- ) 1673.85 ( 0.90) 9.78e-19 ok
9280 1 --- ( --- ) 1744.64 ( 1.22) 8.53e-19 ok
10304 1 --- ( --- ) 1802.00 ( 1.62) 8.74e-19 ok


[2]:
MAGMA 1.6.1 compiled for CUDA capability >= 3.0
CUDA runtime 5050, driver 5050. OpenMP threads 1. MKL 11.1.2, MKL threads 1.
ndevices 1
device 0: Tesla K20X, 732.0 MHz clock, 5759.6 MB memory, capability 3.5
Usage: ./testing_zhesv_nopiv_gpu [options] [-h|--help]

N NRHS CPU GFlop/s (sec) GPU GFlop/s (sec) ||B - AX|| / N*||A||*||X||
================================================================================
1088 2 --- ( --- ) 128.60 ( 0.03) 6.04e-04 failed
2112 2 --- ( --- ) 436.64 ( 0.06) 3.08e-04 failed
3136 2 --- ( --- ) 799.97 ( 0.10) 2.06e-04 failed
4160 2 --- ( --- ) 1086.68 ( 0.18) 1.56e-04 failed
5184 2 --- ( --- ) 1300.81 ( 0.29) 1.26e-04 failed
6208 2 --- ( --- ) 1462.96 ( 0.44) 1.04e-04 failed
7232 2 --- ( --- ) 1584.15 ( 0.64) 8.96e-05 failed
8256 2 --- ( --- ) 1672.28 ( 0.90) 7.90e-05 failed
9280 2 --- ( --- ) 1744.53 ( 1.22) 7.00e-05 failed
10304 2 --- ( --- ) 1799.74 ( 1.62) 6.29e-05 failed

Post Reply