Limitations on precision
-
- Posts: 3
- Joined: Mon Mar 02, 2020 1:12 am
Limitations on precision
Hello
I want to get eigenvalue/eigenvectors for very large and dense matrix (n=100k)
It seems like errors are accumulate more as matrix gets larger (|A-USU^H|)
Is this inherent limitation of double precision arithmetic or can it be mitigated somehow with other iterative method?
Thank you
I want to get eigenvalue/eigenvectors for very large and dense matrix (n=100k)
It seems like errors are accumulate more as matrix gets larger (|A-USU^H|)
Is this inherent limitation of double precision arithmetic or can it be mitigated somehow with other iterative method?
Thank you
Re: Limitations on precision
Are you using MAGMA's testers to test these, e.g., testing/testing_zheevd?
Which specific routine are you using?
If using MAGMA's tester, can you share the complete input & output that is concerning you?
We generally check the relative backwards error,
|| A - U S U^H ||_1 / ( || A ||_1 N )
MAGMA's tester abbreviates that as |A-USU^H| in the output header, but actually computes the above quantity.
The absolute error || A - U S U^H ||_1 does grow with the matrix size, since more values are accumulated into the norm. E.g., if every element of a vector x has some small error tau, then the whole vector has a cumulative error of n*tau.
Mark
Which specific routine are you using?
If using MAGMA's tester, can you share the complete input & output that is concerning you?
We generally check the relative backwards error,
|| A - U S U^H ||_1 / ( || A ||_1 N )
MAGMA's tester abbreviates that as |A-USU^H| in the output header, but actually computes the above quantity.
The absolute error || A - U S U^H ||_1 does grow with the matrix size, since more values are accumulated into the norm. E.g., if every element of a vector x has some small error tau, then the whole vector has a cumulative error of n*tau.
Mark
-
- Posts: 3
- Joined: Mon Mar 02, 2020 1:12 am
Re: Limitations on precision
Himgates3 wrote: ↑Mon Mar 02, 2020 11:08 amAre you using MAGMA's testers to test these, e.g., testing/testing_zheevd?
Which specific routine are you using?
If using MAGMA's tester, can you share the complete input & output that is concerning you?
We generally check the relative backwards error,
|| A - U S U^H ||_1 / ( || A ||_1 N )
MAGMA's tester abbreviates that as |A-USU^H| in the output header, but actually computes the above quantity.
The absolute error || A - U S U^H ||_1 does grow with the matrix size, since more values are accumulated into the norm. E.g., if every element of a vector x has some small error tau, then the whole vector has a cumulative error of n*tau.
Mark
This is what I get with testing_dsyevd
Code: Select all
% MAGMA 2.5.2 compiled for CUDA capability >= 6.0, 64-bit magma_int_t, 64-bit pointer.
% CUDA runtime 10020, driver 10020. OpenMP threads 32.
% device 0: Tesla P100-SXM2-16GB, 1480.5 MHz clock, 16280.9 MiB memory, capability 6.0
% device 1: Tesla P100-SXM2-16GB, 1480.5 MHz clock, 16280.9 MiB memory, capability 6.0
% device 2: Tesla P100-SXM2-16GB, 1480.5 MHz clock, 16280.9 MiB memory, capability 6.0
% device 3: Tesla P100-SXM2-16GB, 1480.5 MHz clock, 16280.9 MiB memory, capability 6.0
% Tue Mar 3 18:43:12 2020
% Usage: ./testing_dsyevd [options] [-h|--help]
% jobz = Vectors needed, uplo = Lower, ngpu = 4
% N CPU Time (sec) GPU Time (sec) |S-S_magma| |A-USU^H| |I-U^H U|
%============================================================================
1088 --- 3.2039 --- 2.06e-17 6.48e-17 ok
1088 --- 0.2568 --- 1.23e-17 6.68e-17 ok
1088 --- 0.2568 --- 6.84e-08 6.70e-06 failed
1088 --- 0.2593 --- 1.53e-17 6.51e-17 ok
1088 --- 0.2554 --- 1.44e-17 6.87e-17 ok
2112 --- 0.6574 --- 7.33e-18 6.04e-17 ok
2112 --- 0.6583 --- 1.59e-17 6.76e-17 ok
2112 --- 0.6598 --- 2.77e-18 6.42e-17 ok
2112 --- 0.7249 --- 4.79e-18 6.50e-17 ok
2112 --- 0.6582 --- 2.22e-18 6.14e-17 ok
3136 --- 1.3959 --- 2.54e-17 5.45e-17 ok
3136 --- 1.3857 --- 1.01e-17 5.70e-17 ok
3136 --- 1.3820 --- 2.37e-17 6.19e-17 ok
3136 --- 1.4300 --- 7.85e-18 5.51e-17 ok
3136 --- 1.3825 --- 3.57e-18 5.59e-17 ok
4160 --- 1.9709 --- 4.23e-09 1.04e-06 failed
4160 --- 1.9827 --- 1.12e-17 5.23e-17 ok
4160 --- 1.9705 --- 1.75e-17 5.57e-17 ok
4160 --- 1.9741 --- 2.12e-17 5.72e-17 ok
4160 --- 1.9651 --- 4.50e-18 5.36e-17 ok
5184 --- 3.0915 --- 1.88e-17 5.66e-17 ok
5184 --- 2.6725 --- 1.28e-17 5.83e-17 ok
5184 --- 2.6832 --- 1.85e-17 5.31e-17 ok
5184 --- 2.8734 --- 2.62e-09 2.13e-07 failed
5184 --- 2.6751 --- 7.31e-07 6.00e-05 failed
6208 --- 3.5111 --- 2.44e-08 2.99e-06 failed
6208 --- 3.5439 --- 5.87e-18 6.15e-17 ok
6208 --- 3.6768 --- 1.36e-17 5.46e-17 ok
6208 --- 3.7247 --- 1.86e-17 5.37e-17 ok
6208 --- 3.6468 --- 7.55e-08 3.33e-05 failed
7232 --- 4.9060 --- 1.08e-10 1.57e-08 failed
7232 --- 4.6172 --- 2.42e-17 5.82e-17 ok
7232 --- 4.5373 --- 1.54e-17 5.51e-17 ok
7232 --- 4.5184 --- 5.64e-18 5.55e-17 ok
7232 --- 4.5125 --- 1.32e-17 5.80e-17 ok
8256 --- 5.1735 --- 1.16e-17 6.23e-17 ok
8256 --- 5.4694 --- 3.05e-09 3.75e-07 failed
8256 --- 5.3396 --- 3.98e-10 2.29e-07 failed
8256 --- 5.7306 --- 4.29e-07 7.17e-05 failed
8256 --- 5.4754 --- 5.26e-10 1.59e-07 failed
9280 --- 6.4114 --- 3.92e-09 7.72e-07 failed
9280 --- 6.7893 --- 2.25e-08 2.22e-06 failed
9280 --- 6.1874 --- 2.92e-10 7.02e-08 failed
9280 --- 6.2186 --- 3.26e-08 1.51e-05 failed
9280 --- 6.4704 --- 4.46e-10 6.61e-08 failed
10304 --- 7.5922 --- 1.18e-07 4.28e-05 failed
10304 --- 7.3870 --- 3.73e-09 5.19e-07 failed
10304 --- 7.6400 --- 1.01e-07 2.91e-05 failed
10304 --- 7.2780 --- 4.08e-08 7.18e-06 failed
10304 --- 7.3946 --- 5.36e-10 1.48e-07 failed
30000 --- 66.6994 --- 2.76e-11 1.57e-08 failed
30000 --- 67.4788 --- 5.08e-18 6.51e-17 ok
30000 --- 63.7498 --- 7.73e-08 1.96e-05 failed
50000 --- 213.1231 --- 1.73e-12 4.78e-09 failed
50000 --- 210.9395 --- 1.36e-08 3.70e-06 failed
50000 --- 207.4701 --- 2.51e-10 9.76e-08 failed
70000 --- 488.2336 --- 7.23e-08 2.34e-05 failed
70000 --- 478.4960 --- 9.65e-09 2.84e-06 failed
it takes few hours to check it's error.
Why is lapackf77_dsyt21 so slow???
Also, I wonder would there be any methods to refine the result (reduce error)
after the execution of dsyevd
Thank you
-
- Posts: 283
- Joined: Fri Aug 21, 2009 10:39 pm
Re: Limitations on precision
Some of these errors seem to be large and inconsistent. This is what I get on one of our systems with V100 and Intel CPU.
These errors are what we expect in double precision. Did you by any chance modify the code, e.g., removing the scaling or changing the input matrices?
Here lapackf77_dsyt21 took 40 seconds. It is slower than the CPU dsyevd because of the way the norms are computed - if you look at the code, the computation is done through rank 1 and 2 updates. Your times seem to be quite larger. In this experiment I used MKL on the CPU. What BLAS/LAPACK are you using on the CPU?
Related to refinement, here are some relevant papers:
http://www.netlib.org/utk/people/JackDo ... sicedr.pdf
http://www.netlib.org/utk/people/JackDo ... values.pdf
Code: Select all
[tomov@a04 testing]$ ./testing_dsyevd -JV --niter 5 -c -l -n 7000
% MAGMA 2.5.2 svn compiled for CUDA capability >= 7.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 9020, driver 10010. OpenMP threads 20. MKL 2017.0.1, MKL threads 20.
% device 0: Tesla V100-PCIE-16GB, 1380.0 MHz clock, 16130.5 MiB memory, capability 7.0
% Wed Mar 11 12:47:04 2020
% Usage: ./testing_dsyevd [options] [-h|--help]
% jobz = Vectors needed, uplo = Lower, ngpu = 1
% N CPU Time (sec) GPU Time (sec) |S-S_magma| |A-USU^H| |I-U^H U|
%============================================================================
7000 12.2292 4.6496 4.83e-19 4.08e-18 4.39e-17 ok
7000 12.2834 4.6451 1.11e-19 6.59e-18 4.20e-17 ok
Here lapackf77_dsyt21 took 40 seconds. It is slower than the CPU dsyevd because of the way the norms are computed - if you look at the code, the computation is done through rank 1 and 2 updates. Your times seem to be quite larger. In this experiment I used MKL on the CPU. What BLAS/LAPACK are you using on the CPU?
Related to refinement, here are some relevant papers:
http://www.netlib.org/utk/people/JackDo ... sicedr.pdf
http://www.netlib.org/utk/people/JackDo ... values.pdf
-
- Posts: 3
- Joined: Mon Mar 02, 2020 1:12 am
Re: Limitations on precision
Thank you for sharing your result Mr. TomovStan Tomov wrote: ↑Wed Mar 11, 2020 1:04 pmSome of these errors seem to be large and inconsistent. This is what I get on one of our systems with V100 and Intel CPU.These errors are what we expect in double precision. Did you by any chance modify the code, e.g., removing the scaling or changing the input matrices?Code: Select all
[tomov@a04 testing]$ ./testing_dsyevd -JV --niter 5 -c -l -n 7000 % MAGMA 2.5.2 svn compiled for CUDA capability >= 7.0, 32-bit magma_int_t, 64-bit pointer. % CUDA runtime 9020, driver 10010. OpenMP threads 20. MKL 2017.0.1, MKL threads 20. % device 0: Tesla V100-PCIE-16GB, 1380.0 MHz clock, 16130.5 MiB memory, capability 7.0 % Wed Mar 11 12:47:04 2020 % Usage: ./testing_dsyevd [options] [-h|--help] % jobz = Vectors needed, uplo = Lower, ngpu = 1 % N CPU Time (sec) GPU Time (sec) |S-S_magma| |A-USU^H| |I-U^H U| %============================================================================ 7000 12.2292 4.6496 4.83e-19 4.08e-18 4.39e-17 ok 7000 12.2834 4.6451 1.11e-19 6.59e-18 4.20e-17 ok
Here lapackf77_dsyt21 took 40 seconds. It is slower than the CPU dsyevd because of the way the norms are computed - if you look at the code, the computation is done through rank 1 and 2 updates. Your times seem to be quite larger. In this experiment I used MKL on the CPU. What BLAS/LAPACK are you using on the CPU?
Related to refinement, here are some relevant papers:
http://www.netlib.org/utk/people/JackDo ... sicedr.pdf
http://www.netlib.org/utk/people/JackDo ... values.pdf
I did not alter the testing code (testing_dsyevd.cpp)
lapackf77_dsyt21 for matrix with N=7000 isn't that long. I guess it's similar with yours but
with N=70k, I think it takes like 10 hours to finish
I am using P100 in IBM POWER 8 system
My MAGMA is using latest OpenBLAS, without IBM MASS or IBM XL compiler. Just compiled by gcc and gfortran
-
- Posts: 283
- Joined: Fri Aug 21, 2009 10:39 pm
Re: Limitations on precision
You may also want to try the 2-stage reduction algorithms, e.g.
These are much faster especially for the large sizes that you target.
Maybe also using multiple GPUs would help (adding "--ngpu 4" option).
Also, you can try with ESSL. There is make.inc example for that ("make.inc.power9-essl") that you may have to modify.
Code: Select all
./testing_dsyevdx_2stage -JV --niter 2 -n 7000
Maybe also using multiple GPUs would help (adding "--ngpu 4" option).
Also, you can try with ESSL. There is make.inc example for that ("make.inc.power9-essl") that you may have to modify.