Can you be more specific about what you are doing? For instance:
- How big is the problem (rows/cols, number of nonzeros)?
- What format is your matrix (CSR, ...)?
- Are you including time to transfer the matrix to the GPU, or are you transferring the matrix before?
- What MAGMA functions are you calling?
- What model GPU are you using?
I suggest using magmaf_wtime, omp_get_wtime, or MPI_Wtime, instead of cpu_time(). Fortran's cpu_time() seems to measure CPU time used by the process (i.e., time the process is working, similar to getrusage), not elapsed wall clock time. Notably, cpu_time() will not reflect time spent on the GPU. We always use wall time.
Here's a simple test (see code below). When timing sleep(1), cpu_time() measures almost no time, since the process isn't working, but magmaf_wtime() measures the expected 1 second elapsed wall time. When timing gemm() with 1 thread, cpu_time() and magmaf_wtime() measure similar times.
Code: Select all
prompt> setenv OMP_NUM_THREADS 1
prompt> setenv VECLIB_MAXIMUM_THREADS 1
prompt> ./time
sleep(1)
cpu_time = 0.000139
magmaf_wtime = 1.003619
gemm()
cpu_time = 0.067450
magmaf_wtime = 0.067607
But when timing gemm() with 2 threads, cpu_time() measures time working in both threads, so it is double the wall clock elapsed time that magmaf_wtime() measures.
Code: Select all
prompt> setenv OMP_NUM_THREADS 2
prompt> setenv VECLIB_MAXIMUM_THREADS 2
prompt> ./time
sleep(1)
cpu_time = 0.000143
magmaf_wtime = 1.005181
gemm()
cpu_time = 0.081534
magmaf_wtime = 0.041313
Code: Select all
program main
use magma
implicit none
double precision :: start, start2, t, t2
double precision :: A(1000,1000), B(1000,1000), C(1000,1000)
integer :: n
double precision :: alpha, beta
n = 1000
alpha = 1.0
beta = 2.0
call cpu_time( start )
call magmaf_wtime( start2 )
print '(a)', 'sleep(1)'
call sleep(1)
call cpu_time( t )
call magmaf_wtime( t2 )
t = t - start
t2 = t2 - start2
print '(a,f8.6)', 'cpu_time = ', t
print '(a,f8.6)', 'magmaf_wtime = ', t2
print '()'
call cpu_time( start )
call magmaf_wtime( start2 )
print '(a)', 'gemm()'
call dgemm( "n", "n", n, n, n, alpha, A, n, B, n, beta, C, n )
call cpu_time( t )
call magmaf_wtime( t2 )
t = t - start
t2 = t2 - start2
print '(a,f8.6)', 'cpu_time = ', t
print '(a,f8.6)', 'magmaf_wtime = ', t2
print '()'
end
On MacOS, compiled with:
Code: Select all
gfortran -Wall -I /opt/magma/include -o time time.f90 -L /opt/magma/lib -Wl,-rpath,/opt/magma/lib -lmagma -framework Accelerate
-mark