LAPACK/ScaLAPACK Development

by **Bruno** » Fri Mar 07, 2008 8:59 am

Hi,

the Lapack User guide says that "xGEQP3 is considerably faster than xGEQPF"
but I found it is not always the case and I look for an explanation. On my laptop
(dell Latitude D620, with linux mandriva spring 2007 as OS) using DGEQP3
+ DORGQR (to form the matrix Q explicitely) or using DGEQPF + DORGQR
lead me to the following results:

matrix size | DGEQP3 + DORGQR | DGEQPF + DORGQR
600 x 400 | 0.24 s | 0.19 s
800 x 600 | 0.62 s | 0.47 s
1000 x 900 | 1.40 s | 1.46 s
1500 x1000 | 3.45 s | 4.1 s
1800 x1400 | 6.42 s | 9.4 s

(I precise that the BLAS lib used is the lastest ATLAS, build on my machine).

So that Yes DGEQP3 becomes faster when the size increase
and is a little slower before but it doesn't ressemble to what I first hope
reading the comment in the Lapack User 's guide. (for DGEQP3 I use first
a query size call to get the optimal work size). To see if I have not done some
mistakes I have compared with different softwares :

1/ with octave which uses DGEQPF + DORGQR (and the same ATLAS lib)
I got nearly the same results (than the ones of the right column)
2/ with Matlab 7.5 which claims to use DGEQP3 (and comes with its own
optimised BLAS) I got : 0.21s, 0.5 s, 1.29 s, 3.55 s, 7.44 s.

In brief these results shows the same tendency: DGEQPF seems a little
faster than DGEQP3 until the array size is not big enough (but 800 x 600
is not a so small array at least for a laptop computer) and after this is
the inverse but DGEQP3 don't appear to be "considerably faster" (may
be it becomes true as the sizes becomes larger and larger).

So my questions :
- is someone has experienced the same behavior between
DGEQP3 and DGEQPF ?
- and which explanation for these results ?

Bruno

by **Julien Langou** » Fri Mar 07, 2008 7:50 pm

Hi Bruno,
I am not that much surprised.
Can you give us the time for DGEQRF for comparison?
(If you have time, DGEQR2 can be a good thing to time as well).
Best wishes,
Julien

by **melrobin** » Sat Mar 08, 2008 1:11 am

This brings up a question. What is the most generally accepted way to time code? I've used the clock() functions which seem to work fine on Mac OS X, but do not work well at all in Visual Studio--thanks to Julie's help I was able to get the VS version to link.

In fact, the CLOCKS_PER_SEC parameter is a paltry 1000 in VS 2005 which means we can do no better than millisecond resolution--a lifetime in computers.

So, the question is: if I told you for example that DGETRF performed its magic in X seconds what should I have done to survive challenges to my timing methodology.

by **Bruno** » Tue Mar 11, 2008 9:11 am

Julien Langou wrote:Hi Bruno,
I am not that much surprised.
Can you give us the time for DGEQRF for comparison?
(If you have time, DGEQR2 can be a good thing to time as well).
Best wishes,
Julien

Sorry there is a mistake in my previous timings : I was not using the
same lapack/atlas for DGEQP3+DORGQR and DGEQPF+DORGQR
(my tests with these 2 last ones were older and I did't remember I have
change both lapack (3.0 -> 3.1) and atlas when I posted my first
message). Here are my new timing, concerning the run of DGEQP3,
DGEQPF, DGEQRF and DGEQR2 alone (that is without a supplementary
call to DORGQR):

size | DGEQP3 | DGEQPF | DGEQRF | DGEQR2
600x400 | 0.136 s | 0.148 s | 0.052 s | 0.144 s
800x600 | 0.380 s | 0.444 s | 0.128 s | 0.432 s
1000x900 | 1.024 s | 1.5 s | 0.292 s | 1.470 s
1500x1000| 3.648 s | 6.94 s | 0.936 s | 6.88 s
1800x1400| 4.544 s | 9.117 s | 1.2 s | 9.03 s

So forget my previous conclusions, DGEQP3 is well faster than DGEQPF
(but not so "considerably faster" like DGEQRF over DGEQR2).

May be it is quite normal or may be DGEQP3 need a better blas lib
than the one build with atlas.

Bruno

by **Bruno** » Tue Mar 11, 2008 9:20 am

melrobin wrote:This brings up a question. What is the most generally accepted way to time code? I've used the clock() functions which seem to work fine on Mac OS X, but do not work well at all in Visual Studio--thanks to Julie's help I was able to get the VS version to link.

In fact, the CLOCKS_PER_SEC parameter is a paltry 1000 in VS 2005 which means we can do no better than millisecond resolution--a lifetime in computers.

So, the question is: if I told you for example that DGETRF performed its magic in X seconds what should I have done to survive challenges to my timing methodology.

The timings I have given are got with getrusage (I work with linux). It is provided
on Mac OS X and I have heard of some emulation on Windows which would be
in psapi.dll (I have never tried though).

Bruno

by **Fala** » Mon Mar 31, 2008 4:55 am

Bruno wrote:Hi,

the Lapack User guide says that "xGEQP3 is considerably faster than xGEQPF"
but I found it is not always the case and I look for an explanation.

Greetings to all I am new to Lapack and also to the discussion group.

Can anyone please show me how the permutation matrix E in Matlab QR factorisation can be calculated, ie, the sequence of calls to do this in Lapack? Something like the following as described in Matlab:

[Q,R,E] = QR(A) produces unitary Q, upper triangular R and a
permutation matrix E so that A*E = Q*R. The column permutation E is
chosen so that ABS(DIAG(R)) is decreasing.

I use JLapack (Java lapack) and I am sure the Lapack sequence of function call would be the same as in JLapack.

Any hint of how to compute permutation matrix E would be much appreciated.

by **Julien Langou** » Mon Mar 31, 2008 10:18 am

You do not need to compute the permutation matrix E, it is given by the
LAPACK routine DGEQP3 or DGEQPF in the array JPVT:

Code: Select all: * JPVT (input/output) INTEGER array, dimension (N) * On entry, if JPVT(J).ne.0, the J-th column of A is permuted * to the front of A*P (a leading column); if JPVT(J)=0, * the J-th column of A is a free column. * On exit, if JPVT(J)=K, then the J-th column of A*P was the * the K-th column of A.

by **Fala** » Mon Mar 31, 2008 1:43 pm

Julien Langou wrote:You do not need to compute the permutation matrix E, it is given by the
LAPACK routine DGEQP3 or DGEQPF in the array JPVT:
Code: Select all
* JPVT (input/output) INTEGER array, dimension (N) * On entry, if JPVT(J).ne.0, the J-th column of A is permuted * to the front of A*P (a leading column); if JPVT(J)=0, * the J-th column of A is a free column. * On exit, if JPVT(J)=K, then the J-th column of A*P was the * the K-th column of A.

Julian,

Would you mind showing a sample code please? As I have mentioned in my previous message that Lapack is new to me. Here is the JLapack description of the DGEQP3 function:

Code: Select all: * Purpose * ======= * * DGEQP3 computes a QR factorization with column pivoting of a * matrix A: A*P = Q*R using Level 3 BLAS. * * Arguments * ========= * * M (input) INTEGER * The number of rows of the matrix A. M >= 0. * * N (input) INTEGER * The number of columns of the matrix A. N >= 0. * * A (input/output) DOUBLE PRECISION array, dimension (LDA,N) * On entry, the M-by-N matrix A. * On exit, the upper triangle of the array contains the * min(M,N)-by-N upper trapezoidal matrix R; the elements below * the diagonal, together with the array TAU, represent the * orthogonal matrix Q as a product of min(M,N) elementary * reflectors. * * LDA (input) INTEGER * The leading dimension of the array A. LDA >= max(1,M). * * JPVT (input/output) INTEGER array, dimension (N) * On entry, if JPVT(J).ne.0, the J-th column of A is permuted * to the front of A*P (a leading column); if JPVT(J)=0, * the J-th column of A is a free column. * On exit, if JPVT(J)=K, then the J-th column of A*P was the * the K-th column of A. * * TAU (output) DOUBLE PRECISION array, dimension (min(M,N)) * The scalar factors of the elementary reflectors. * * WORK (workspace/output) DOUBLE PRECISION array, dimension (MAX(1,L * On exit, if INFO=0, WORK(1) returns the optimal LWORK. * * LWORK (input) INTEGER * The dimension of the array WORK. LWORK >= 3*N+1. * For optimal performance LWORK >= 2*N+( N+1 )*NB, where NB * is the optimal blocksize. * * If LWORK = -1, then a workspace query is assumed; the routine * only calculates the optimal size of the WORK array, returns * this value as the first entry of the WORK array, and no error * message related to LWORK is issued by XERBLA. * * INFO (output) INTEGER * = 0: successful exit. * < 0: if INFO = -i, the i-th argument had an illegal value. * * Further Details * =============== * * The matrix Q is represented as a product of elementary reflectors * * Q = H(1) H(2) . . . H(k), where k = min(m,n). * * Each H(i) has the form * * H(i) = I - tau * v * v' * * where tau is a real/complex scalar, and v is a real/complex vector * with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is stored on exit in * A(i+1:m,i), and tau in TAU(i).

and the function & parameter list is shown below:

Code: Select all: static void dgeqp3(int m, int n, double[] a, int _a_offset, int lda, int[] jpvt, int _jpvt_offset, double[] tau, int _tau_offset, double[] work, int _work_offset, int lwork, intW info)

Suppose that I have a matrix A, how would I use the function above for computation of A so that the permutation parameter, jpvt is calculated?

LAPACK/ScaLAPACK Development

QR factorization : dgeqpf versus dgeqp3

QR factorization : dgeqpf versus dgeqp3

Generally accepted means for timing code

Re: Generally accepted means for timing code

Re: QR factorization : dgeqpf versus dgeqp3

Who is online