LAPACK/ScaLAPACK Development

by **ejspeiro** » Mon May 12, 2014 3:22 pm

Dear all,

I am using Valgrind's memcheck to supervise the memory manipulation of my code. Yet, it is reporting a very strange invalid access from the LAPACK.

The following code depicts a minimally working example, which can be compiled as follows:

Code: Select all: g++ -std=c++11 -c -g -Wall mwe.cc gfortran -o mwe mwe.o $(HOME)/Libraries/lapack-3.4.1/liblapack.a $(HOME)/Libraries/BLAS/libblas.a -lstdc++

The code creates a generator vector, its related Vandermonde matrix, and then performs a QR factorization of it. Once the factorization completes, we extract and print the Q matrix.

Code: Select all: #include <iostream> #include <iomanip> #include <cmath> #include <cstring> #define VERBOSE 1 // Should I verbose the execution? using namespace std; extern "C" { // External routines from the LAPACK: // LAPACK can be installed from either: // 1. http://www.netlib.org/lapack/ // 2. https://software.intel.com/en-us/non-commercial-software-development // Double precision General QR Factorization to compute the null-space: // Theory: http://www.netlib.org/lapack/lug/node69.html // http://www.netlib.no/netlib/lapack/double/dgeqrf.f void dgeqrf_(int *m, int *n, double *a, int *lda, double *tau, double *work, int *lwork, int *info); // dormqr_ overwrites the general real M-by-N matrix C with // // SIDE = 'L' SIDE = 'R' // TRANS = 'N': Q * C C * Q // TRANS = 'T': Q**T * C C * Q**T // // where Q is a real orthogonal matrix defined as the product of k // elementary reflectors // // Q = H(1) H(2) . . . H(k) // // as returned by DGEQRF. Q is of order M if SIDE = 'L' and of order N // if SIDE = 'R'. // http://www.netlib.no/netlib/lapack/double/dormqr.f void dormqr_(char *side, char *trans, int *m, int *n, int *k, double *a, int *lda, double *tau, double *c, int *ldc, double *work, int *lwork, int *info); } // Used to compute a 1D index (idx) from a 2D set of indexes. inline int idx(const int ii, const int offset, const int jj) { return ii*offset + jj; } // Generates an identity matrix: double* Identity(int num_rows, int num_cols, bool transpose) { double* II {}; // Output identity matrix. if (num_rows < 1 || num_cols < 1) { return nullptr; } try { II = new double[num_rows*num_cols]; memset (II, 0, sizeof(II[0])*num_rows*num_cols); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } if (transpose) { for (auto ii = 0; ii < num_rows; ii++) { for (auto jj = 0; jj < num_cols; jj++) { II[idx(jj,num_rows,ii)] = (ii == jj)? 1.0: 0.0; } } } else { for (auto ii = 0; ii < num_rows; ii++) { for (auto jj = 0; jj < num_cols; jj++) { II[idx(ii,num_cols,jj)] = (ii == jj)? 1.0: 0.0; } } } return II; } // Transposes a matrix: double* Transpose(double *AA, int rr, int cc) { double* AT {}; // Output transpose matrix. if (rr <= 0 || cc <= 0 || AA == nullptr) { return nullptr; } try { AT = new double[rr*cc]; memset(AT, 0, sizeof(AT[0])*(rr*cc)); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } for (auto ii = 0; ii < rr; ii++) { for (auto jj = 0; jj < cc; jj++) { AT[idx(jj,rr,ii)] = AA[idx(ii,cc,jj)]; } } return AT; } // Generates a Vandermonde matrix with a given generator vector gen: double* VandermondeOrDie(const double *gen, // Given generator vector. const int length_gen, // How long generator vector? const int num_rows, // How many rows? const int num_cols, // How many columns? const bool transpose) { // Should create transposed? double* TT {}; // Output Vandermonde matrix. if (length_gen < 1 || num_rows < 1 || num_cols < 1 || length_gen > num_cols) { return nullptr; } try { TT = new double[num_rows*num_cols]; memset (TT, 0, sizeof(TT[0])*num_rows*num_cols); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } if (transpose) { for (auto ii = 0; ii < num_rows; ii++) { for (auto jj = 0; jj < num_cols; jj++) { TT[idx(jj,num_rows,ii)] = pow(gen[jj], (double) ii); } } } else { for (auto ii = 0; ii < num_rows; ii++) { for (auto jj = 0; jj < num_cols; jj++) { TT[idx(ii,num_cols,jj)] = pow(gen[jj], (double) ii); } } } return TT; } int main () { // Input: int kk {8}; // Order of numerical accuracy of the operator. // Variables: char side {'L'}; // Side for overwriting Q. char trans {'N'}; // Should it be transposed? double* gg {}; // Spatial coordinates for the boundary points. double* AA {}; // Boundary Vandermonde to get the null-space. double* tau {}; // The scalar factors of elementary reflectors for QR. double* work {}; // Work array for dgels_ solver and dgeqrf_ QR factor. double* QQ {}; // Q matrix from qr(AA) containing AA's null-space. int info {0}; // Information regarding solution of the system. int num_bndy_approxs; // Required boundary points for uniform order accuracy. int AA_num_rows; // Number rows for the boundary Vandermonde matrices. int AA_num_cols; // Number cols for the boundary Vandermonde matrices. int AA_ld; // Leading dimension of the boundary Vandermonde matrix. int lwork {-1}; // Length of work array for the dgels_ solver. int QQ_ld; // Leading dimension of the Q = qr(AA) matrix. // Compute amount of boundary points required to achieve uniform accuracy: num_bndy_approxs = (int) (3.0*((double) kk)/2.0); // Generator vector for the next approximation: try { gg = new double[num_bndy_approxs]; memset(gg, 0, sizeof(gg[0])*num_bndy_approxs); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } gg[0] = -1.0/2.0; for (auto ii = 1; ii < num_bndy_approxs; ii++) { gg[ii] = gg[ii - 1] + 1.0; } // Before solving this and any other systems for the boundary matrices, we // must obtain the null-space from the first matrix. For this we will follow // recommendations given in: // // http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=5&t=4506 // // We will compute the QR Factorization of the transpose, as in the // following (MATLAB) pseudo-code: // // [Q,R] = qr(A'); % Full QR as defined in // % http://www.stanford.edu/class/ee263/notes/qr_matlab.pdf // null-space = Q(:, last (kk/2 - 1) columns of Q); // // However, given the nature of the Vandermonde matrices we will compute-- // we will compute dim_null of them--they will all posses the same // null-space. Therefore, we impose the convention of computing the // null-space of the first Vandermonde matrix only (for the west boundary). // Vandermonde matrix (using boundary coordinates as the generator): // We compute it NOT transposed, which is the ACTUAL transpose for // LAPACK's QR factor. AA_num_rows = kk + 1; AA_num_cols = num_bndy_approxs; AA = VandermondeOrDie(gg, num_bndy_approxs, AA_num_rows, AA_num_cols, false); if (AA == nullptr) { cerr << "ERROR constructing matrix at line " << __LINE__ + 2 << endl; cerr << "Exiting..." << endl; return EXIT_FAILURE; } // Prepare to factorize: allocate and inquire for the value of lwork: try { tau = new double[1]; memset(tau, 0, sizeof(tau[0])*1); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } try { work = new double[1]; memset(work, 0, sizeof(work[0])*1); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } AA_ld = AA_num_cols; info = 0; dgeqrf_(&AA_num_cols, &AA_num_rows, AA, &AA_ld, tau, work, &lwork, &info); lwork = (int) work[0]; // Once we know lwork, we can actually invoke the factorization: if (tau != nullptr) { delete[] tau; tau = nullptr; } try { tau = new double[AA_num_cols]; memset(tau, 0, sizeof(tau[0])*AA_num_cols); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } if (work != nullptr) { delete[] work; work = nullptr; } try { work = new double[lwork]; memset(work, 0, sizeof(work[0])*(lwork)); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } dgeqrf_(&AA_num_cols, &AA_num_rows, AA, &AA_ld, tau, work, &lwork, &info); if (!info) { if (VERBOSE) { cout << "QR factorization successfully completed!" << endl << endl; } } else { cerr << "Something went wrong factoring! info = " << info << endl; } // Generate the real matrix Q with orthonormal columns. This has to be done // separately since the actual output of dgeqrf_(AA) represents the // orthogonal matrix Q as a product of AA_num_cols elementary Householder // reflectors: QQ_ld = AA_num_cols; try { QQ = new double[QQ_ld*QQ_ld]; memset(QQ, 0, sizeof(QQ[0])*(QQ_ld*QQ_ld)); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } // QQ must be initialized to the identity matrix: for (auto ii = 0; ii < QQ_ld; ii++) { for (auto jj = 0; jj < QQ_ld; jj++) { if (ii == jj) { QQ[idx(ii,QQ_ld,jj)] = 1.0; } } } // To complete the recovery of the real QQ, we must re-inquire the new value // for lwork that is going to be used by the dormqr_ routine we will use. // Notice we leave the tau array alone... i.e. we only re-allocate the work // array. if (work != nullptr) { delete[] work; work = nullptr; } try { work = new double[1]; memset(work, 0, sizeof(work[0])*1); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } lwork = -1; dormqr_(&side, &trans, &AA_num_cols, &AA_num_cols, &AA_num_cols, AA, &AA_ld, tau, QQ, &QQ_ld, work, &lwork, &info); lwork = (int) work[0]; if (work != nullptr) { delete[] work; work = nullptr; } try { work = new double[lwork]; memset(work, 0, sizeof(work[0])*(lwork)); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } info = 0; // WARNING: We get the following warning from Valgrind's memcheck, when // invoking the following routine: // ==14545== Invalid read of size 8 // ==14545== at 0x40BA4B: dorm2r_ (dorm2r.f:272) // ==14545== by 0x4080F1: dormqr_ (dormqr.f:289) // ==14545== by 0x402450: main (divrcrs.cc:480) // ==14545== Address 0x5f79c78 is not stack'd, malloc'd or (recently) free'd // ==14545== // ==14545== Invalid write of size 8 // ==14545== at 0x40BA7A: dorm2r_ (dorm2r.f:273) // ==14545== by 0x4080F1: dormqr_ (dormqr.f:289) // ==14545== by 0x402450: main (divrcrs.cc:480) // ==14545== Address 0x5f79c78 is not stack'd, malloc'd or (recently) free'd // ==14545== // ==14545== Invalid write of size 8 // ==14545== at 0x40BB4F: dorm2r_ (dorm2r.f:276) // ==14545== by 0x4080F1: dormqr_ (dormqr.f:289) // ==14545== by 0x402450: main (divrcrs.cc:480) // ==14545== Address 0x5f79c78 is not stack'd, malloc'd or (recently) free'd // ==14545== // TODO: We should make a more profound debug, unless we can no longer // depend on the matrix QQ (within the code flow). dormqr_(&side, &trans, &AA_num_cols, &AA_num_cols, &AA_num_cols, AA, &AA_ld, tau, QQ, &QQ_ld, work, &lwork, &info); if (!info) { if (VERBOSE) { cout << "Q matrix successfully attained!" << endl << endl; cout << "Q = " << endl; for (auto ii = 0; ii < QQ_ld; ii++) { for (auto jj = 0; jj < QQ_ld; jj++) { cout << setw(12) << QQ[idx(ii,QQ_ld,jj)]; } cout << endl; } cout << endl; } } else { cerr << "Something went wrong building QQ! info = " << info << endl; } // Garbage collection :) if (gg != nullptr) { delete[] gg; gg = nullptr; } if (AA != nullptr) { delete[] AA; AA = nullptr; } if (tau != nullptr) { delete[] tau; tau = nullptr; } if (work != nullptr) { delete[] work; work = nullptr; } if (QQ != nullptr) { delete[] QQ; QQ = nullptr; } }

Output:

Code: Select all: ejspeiro@Eduardo-Alienware-14:~/Dropbox/mimops$ make run ./mwe QR factorization successfully completed! Q matrix successfully attained! Q = -0.288675 -0.288675 -0.288675 -0.288675 -0.288675 -0.288675 -0.288675 -0.288675 -0.288675 -0.288675 -0.288675 -0.288675 -0.459933 -0.376309 -0.292685 -0.209061 -0.125436 -0.0418121 0.0418121 0.125436 0.209061 0.292685 0.376309 0.459933 0.501828 0.228104 0.00912415 -0.155111 -0.2646 -0.319345 -0.319345 -0.2646 -0.155111 0.00912415 0.228104 0.501828 0.459933 -0.0418121 -0.292685 -0.348434 -0.26481 -0.0975616 0.0975616 0.26481 0.348434 0.292685 0.0418121 -0.459933 -0.368767 0.301718 0.368767 0.145272 -0.134097 -0.312893 -0.312893 -0.134097 0.145272 0.368767 0.301718 -0.368767 -0.261608 0.451869 0.166478 -0.229898 -0.348811 -0.15855 0.15855 0.348811 0.229898 -0.166478 -0.451869 0.261608 0.164197 -0.462738 0.164197 0.373176 0.0597081 -0.298541 -0.298541 0.0597081 0.373176 0.164197 -0.462738 0.164197 -0.0904791 0.370142 -0.412914 -0.136541 0.335595 0.230311 -0.230311 -0.335595 0.136541 0.412914 -0.370142 0.0904791 0.0430767 -0.23888 0.466011 -0.254544 -0.289789 0.274124 0.274124 -0.289789 -0.254544 0.466011 -0.23888 0.0430767 -0.00874036 0.0583 -0.146954 0.141249 0.0486822 -0.156543 -0.130333 0.566054 -0.653529 0.386018 -0.119774 0.0155708 0.0117399 -0.0941426 0.322481 -0.603006 0.637649 -0.321721 -0.0245471 0.104354 -0.0153304 -0.0339244 0.0199423 -0.00349546 0.0106414 -0.0748491 0.203312 -0.217426 -0.109592 0.578839 -0.66744 0.29945 0.0592101 -0.129478 0.0558632 -0.00853182

It seems to be running fine! Except that when I go to Valgrind's memcheck, I get this output:

Code: Select all: ==21810== HEAP SUMMARY: ==21810== in use at exit: 0 bytes in 0 blocks ==21810== total heap usage: 30 allocs, 30 frees, 35,757 bytes allocated ==21810== ==21810== All heap blocks were freed -- no leaks are possible ==21810== ==21810== ERROR SUMMARY: 6 errors from 3 contexts (suppressed: 0 from 0) ==21810== ==21810== 2 errors in context 1 of 3: ==21810== Invalid write of size 8 ==21810== at 0x40A0A1: dorm2r_ (dorm2r.f:276) ==21810== by 0x406B30: dormqr_ (dormqr.f:289) ==21810== by 0x4020C6: main (mwe.cc:399) ==21810== Address 0x5f79870 is not stack'd, malloc'd or (recently) free'd ==21810== ==21810== ==21810== 2 errors in context 2 of 3: ==21810== Invalid write of size 8 ==21810== at 0x409FCC: dorm2r_ (dorm2r.f:273) ==21810== by 0x406B30: dormqr_ (dormqr.f:289) ==21810== by 0x4020C6: main (mwe.cc:399) ==21810== Address 0x5f79870 is not stack'd, malloc'd or (recently) free'd ==21810== ==21810== ==21810== 2 errors in context 3 of 3: ==21810== Invalid read of size 8 ==21810== at 0x409F9D: dorm2r_ (dorm2r.f:272) ==21810== by 0x406B30: dormqr_ (dormqr.f:289) ==21810== by 0x4020C6: main (mwe.cc:399) ==21810== Address 0x5f79870 is not stack'd, malloc'd or (recently) free'd ==21810== ==21810== ERROR SUMMARY: 6 errors from 3 contexts (suppressed: 0 from 0)

Any hint? o.O Through DDD, I tracked the bug to the suggested line...

It seems that, the routine indexes the matrix at the very last value of it... namely, A(N,N). But what to do from C++11?

Feel free to suggest about my coding!

Thanks in advanced!

by **Julien Langou** » Mon May 12, 2014 3:51 pm

I have glanced at the code super fast. Best I can do. I see that you are using DORMQR on identity to generate Q. This is OK. Maybe a better way is to use DORGQR right away. This does not explain your problem. This is just feedback from parsing the code. Julien.

by **ejspeiro** » Mon May 12, 2014 4:01 pm

Dear Julien,

I will keep that into consideration.

Thanks for your efforts,

by **ejspeiro** » Wed May 14, 2014 4:22 pm

I have written a minimal working example, in where I try to implement this QR factorization, but now using DORGQR instead of DORMQR

I want to ask the following:

Julien Langou wrote:Maybe a better way is to use DORGQR right away.

What do you mean with "right away"? As far as what I could read regarding DORGQR, you still have to compute the factorization using dgeqrf_, so what would be the advantage of using DORGQR? My guess is that it has to do with the fact that with DORGQR, one does not have to allocate extra space for the Q matrix, nor initialize it to the identity matrix, whereas with DORMQR, one has to do this.

However, provided this is true, what if my system is an under-determined system? Where A is 9 by 12, but my Q matrix is 12 by 12?

My minimal working example fails, because of this:

Code: Select all: #include <iostream> #include <iomanip> #include <cmath> #include <cstring> #define VERBOSE 1 // Should I verbose the execution? using namespace std; extern "C" { // External routines from the LAPACK: // LAPACK can be installed from either: // 1. http://www.netlib.org/lapack/ // 2. https://software.intel.com/en-us/non-commercial-software-development // Double precision General QR Factorization to compute the null-space: // Theory: http://www.netlib.org/lapack/lug/node69.html // http://www.netlib.no/netlib/lapack/double/dgeqrf.f void dgeqrf_(int *m, int *n, double *a, int *lda, double *tau, double *work, int *lwork, int *info); // dormqr_ overwrites the general real M-by-N matrix C with // // SIDE = 'L' SIDE = 'R' // TRANS = 'N': Q * C C * Q // TRANS = 'T': Q**T * C C * Q**T // // where Q is a real orthogonal matrix defined as the product of k // elementary reflectors // // Q = H(1) H(2) . . . H(k) // // as returned by DGEQRF. Q is of order M if SIDE = 'L' and of order N // if SIDE = 'R'. // http://www.netlib.no/netlib/lapack/double/dormqr.f void dormqr_(char *side, char *trans, int *m, int *n, int *k, double *a, int *lda, double *tau, double *c, int *ldc, double *work, int *lwork, int *info); // dorgqr_ generates an M-by-N real matrix Q with orthonormal columns, // which is defined as the first N columns of a product of K elementary // reflectors of order M // // Q = H(1) H(2) . . . H(k) // // as returned by DGEQRF. void dorgqr_(int *m, int *n, int *k, double *a, int *lda, double *tau, double *work, int *lwork, int *info); } // Used to compute a 1D index (idx) from a 2D set of indexes. inline int idx(const int ii, const int offset, const int jj) { return ii*offset + jj; } // Generates an identity matrix: double* Identity(int num_rows, int num_cols, bool transpose) { double* II {}; // Output identity matrix. if (num_rows < 1 || num_cols < 1) { return nullptr; } try { II = new double[num_rows*num_cols]; memset (II, 0, sizeof(II[0])*num_rows*num_cols); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } if (transpose) { for (auto ii = 0; ii < num_rows; ii++) { for (auto jj = 0; jj < num_cols; jj++) { II[idx(jj,num_rows,ii)] = (ii == jj)? 1.0: 0.0; } } } else { for (auto ii = 0; ii < num_rows; ii++) { for (auto jj = 0; jj < num_cols; jj++) { II[idx(ii,num_cols,jj)] = (ii == jj)? 1.0: 0.0; } } } return II; } // Transposes a matrix: double* Transpose(double *AA, int rr, int cc) { double* AT {}; // Output transpose matrix. if (rr <= 0 || cc <= 0 || AA == nullptr) { return nullptr; } try { AT = new double[rr*cc]; memset(AT, 0, sizeof(AT[0])*(rr*cc)); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } for (auto ii = 0; ii < rr; ii++) { for (auto jj = 0; jj < cc; jj++) { AT[idx(jj,rr,ii)] = AA[idx(ii,cc,jj)]; } } return AT; } // Generates a Vandermonde matrix with a given generator vector gen: double* VandermondeOrDie(const double *gen, // Given generator vector. const int length_gen, // How long generator vector? const int num_rows, // How many rows? const int num_cols, // How many columns? const bool transpose) { // Should create transposed? double* TT {}; // Output Vandermonde matrix. if (length_gen < 1 || num_rows < 1 || num_cols < 1 || length_gen > num_cols) { return nullptr; } try { TT = new double[num_rows*num_cols]; memset (TT, 0, sizeof(TT[0])*num_rows*num_cols); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } if (transpose) { for (auto ii = 0; ii < num_rows; ii++) { for (auto jj = 0; jj < num_cols; jj++) { TT[idx(jj,num_rows,ii)] = pow(gen[jj], (double) ii); } } } else { for (auto ii = 0; ii < num_rows; ii++) { for (auto jj = 0; jj < num_cols; jj++) { TT[idx(ii,num_cols,jj)] = pow(gen[jj], (double) ii); } } } return TT; } int main () { // Input: int kk {8}; // Order of numerical accuracy of the operator. // Variables: double* gg {}; // Spatial coordinates for the boundary points. double* AA {}; // Boundary Vandermonde to get the null-space. double* tau {}; // The scalar factors of elementary reflectors for QR. double* work {}; // Work array for dgels_ solver and dgeqrf_ QR factor. int info {0}; // Information regarding solution of the system. int num_bndy_approxs; // Required boundary points for uniform order accuracy. int AA_num_rows; // Number rows for the boundary Vandermonde matrices. int AA_num_cols; // Number cols for the boundary Vandermonde matrices. int AA_ld; // Leading dimension of the boundary Vandermonde matrix. int lwork {-1}; // Length of work array for the dgels_ solver. // Compute amount of boundary points required to achieve uniform accuracy: num_bndy_approxs = (int) (3.0*((double) kk)/2.0); // Generator vector for the next approximation: try { gg = new double[num_bndy_approxs]; memset(gg, 0, sizeof(gg[0])*num_bndy_approxs); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } gg[0] = -1.0/2.0; for (auto ii = 1; ii < num_bndy_approxs; ii++) { gg[ii] = gg[ii - 1] + 1.0; } // Before solving this and any other systems for the boundary matrices, we // must obtain the null-space from the first matrix. For this we will follow // recommendations given in: // // http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=5&t=4506 // // We will compute the QR Factorization of the transpose, as in the // following (MATLAB) pseudo-code: // // [Q,R] = qr(A'); % Full QR as defined in // % http://www.stanford.edu/class/ee263/notes/qr_matlab.pdf // null-space = Q(:, last (kk/2 - 1) columns of Q); // // However, given the nature of the Vandermonde matrices we will compute-- // we will compute dim_null of them--they will all posses the same // null-space. Therefore, we impose the convention of computing the // null-space of the first Vandermonde matrix only (for the west boundary). // Vandermonde matrix (using boundary coordinates as the generator): // We compute it NOT transposed, which is the ACTUAL transpose for // LAPACK's QR factor. AA_num_rows = kk + 1; AA_num_cols = num_bndy_approxs; AA = VandermondeOrDie(gg, num_bndy_approxs, AA_num_rows, AA_num_cols, false); if (AA == nullptr) { cerr << "ERROR constructing matrix at line " << __LINE__ + 2 << endl; cerr << "Exiting..." << endl; return EXIT_FAILURE; } // Prepare to factorize: allocate and inquire for the value of lwork: try { tau = new double[1]; memset(tau, 0, sizeof(tau[0])*1); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } try { work = new double[1]; memset(work, 0, sizeof(work[0])*1); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } AA_ld = AA_num_cols; info = 0; dgeqrf_(&AA_num_cols, &AA_num_rows, AA, &AA_ld, tau, work, &lwork, &info); if (info == 0) { lwork = (int) work[0]; } else { cerr << "Could not get value for lwork on line " << __LINE__ - 2 << endl; cerr << "Exiting..." << endl; return EXIT_FAILURE; } // Once we know lwork, we can actually invoke the factorization: if (tau != nullptr) { delete[] tau; tau = nullptr; } try { tau = new double[AA_num_cols]; memset(tau, 0, sizeof(tau[0])*AA_num_cols); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } if (work != nullptr) { delete[] work; work = nullptr; } try { work = new double[lwork]; memset(work, 0, sizeof(work[0])*(lwork)); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } dgeqrf_(&AA_num_cols, &AA_num_rows, AA, &AA_ld, tau, work, &lwork, &info); if (!info) { if (VERBOSE) { cout << "QR factorization successfully completed!" << endl << endl; } } else { cerr << "Something went wrong factoring! info = " << info << endl; } // Generate the real matrix Q with orthonormal columns. This has to be done // separately since the actual output of dgeqrf_(AA) represents the // orthogonal matrix Q as a product of AA_num_cols elementary Householder // reflectors: // To complete the recovery of the real QQ, we must re-inquire the new value // for lwork that is going to be used by the dormqr_ routine we will use. // Notice we leave the tau array alone... i.e. we only re-allocate the work // array. // Do NOT reallocate the tau array. It contains info from the factorization. if (work != nullptr) { delete[] work; work = nullptr; } try { work = new double[1]; memset(work, 0, sizeof(work[0])*1); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } lwork = -1; dorgqr_(&AA_num_cols, &AA_num_cols, &AA_num_cols, AA, &AA_ld, tau, work, &lwork, &info); if (info == 0) { lwork = (int) work[0]; } else { cerr << "Could not get value for lwork on line " << __LINE__ - 2 << endl; cerr << "Exiting..." << endl; return EXIT_FAILURE; } if (work != nullptr) { delete[] work; work = nullptr; } try { work = new double[lwork]; memset(work, 0, sizeof(work[0])*(lwork)); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } dorgqr_(&AA_num_cols, &AA_num_cols, &AA_num_cols, AA, &AA_ld, tau, work, &lwork, &info); if (!info) { if (VERBOSE) { cout << "Q matrix successfully attained!" << endl << endl; cout << "Q = " << endl; for (auto ii = 0; ii < AA_num_cols; ii++) { for (auto jj = 0; jj < AA_num_cols; jj++) { cout << setw(12) << AA[idx(ii,AA_num_cols,jj)]; } cout << endl; } cout << endl; } } else { cerr << "Something went wrong building QQ! info = " << info << endl; } // Garbage collection :) if (gg != nullptr) { delete[] gg; gg = nullptr; } if (AA != nullptr) { delete[] AA; AA = nullptr; } if (tau != nullptr) { delete[] tau; tau = nullptr; } if (work != nullptr) { delete[] work; work = nullptr; } }

Thanks in advanced!

by **Julien Langou** » Thu May 15, 2014 3:51 pm

(1) If A is 9-by-12, then Q is 9-by-9 as well (and R is 9-by-12). I do not think your call to DORGQR is not valid. Needs to be: &AA_num_rows for the first three arguments.

(2) You can get Q by using DORMQR on the identity. This works. Two advantages of using DORGQR: (1) less flops, much less actually, and (2) you do not need to allocate another matrix and DORGQR writes into the A output from DGEQRF (if appropriate).

(3) "As far as what I could read regarding DORGQR, you still have to compute the factorization using dgeqrf_," => Sure. My "right away" meant no need to allocate a Q, initialize a Q. You can call DORQR right after DGEQRF.

Julien.

by **ejspeiro** » Thu May 15, 2014 3:55 pm

Dear Julien,

Fair enough

Thanks a lot! I will work on it and let everyone know

I hold a separate thread, regarding tracking bugs on a much larger code, using this chunk. I use the LAPACK extensively

http://www.linuxquestions.org/questions ... 175504890/

Thanks for all your help!

by **ejspeiro** » Fri May 16, 2014 12:47 pm

Well, there's a key detail I forgot to mention: The matrix I intend to factorize has to be transposed, since I intend to compute its null-space, as I have already explained on a previopus thread I started a while back: viewtopic.php?f=5&t=4506

Based on that, I think I'm almost there:

Here's the C++ minimal working example (mwe) that construct the transpose of the Vandermonde matrix and factorizes it:

Code: Select all: #include <iostream> #include <iomanip> #include <cmath> #include <cstring> #define VERBOSE 1 // Should I verbose the execution? using namespace std; extern "C" { // Double precision General QR Factorization to compute the null-space: // Theory: http://www.netlib.org/lapack/lug/node69.html // http://www.math.utah.edu/software/lapack/lapack-d/dgeqrf.html void dgeqrf_(int *m, // 1. Number of rows of the matrix. int *n, // 2. Number of columns of the matrix. double *a, // 3. The matrix. int *lda, // 4. Leading dimension of the matrix. double *tau, // 5. Scalar factors of the elementary reflectors. double *work, // 6. Workspace. info = 0, work(1) is optimal lwork. int *lwork, // 7. The dimension of work. lwork >= max(1,n). int *info); // 8. info = 0: successful exit. // dorgqr_ generates an M-by-N real matrix Q with orthonormal columns, // which is defined as the first N columns of a product of K elementary // reflectors of order M // // Q = H(1) H(2) . . . H(k) // // as returned by DGEQRF. // http://www.math.utah.edu/software/lapack/lapack-d/dorgqr.html void dorgqr_(int *m, // 1. The number of rows of the matrix Q. M >= 0. int *n, // 2. The number of cols of Q. M >= N >= 0. int *k, // 3. Number of elementary reflectors. double *a, // 4. Returned by DGEQRF first k cols of argument A. int *lda, // 5. First dimension of A. LDA >= max(1,M). double *tau, // 6. TAU(i) is the scalar of reflector H(i). double *work, // 7. Workspace. info = 0, work(1) is optimal lwork. int *lwork, // 8. The dimension of work. lwork >= max(1,n). int *info); // 9. info = 0: successful exit. } // Used to compute a 1D index (idx) from a 2D set of indexes. // WARNING: Not needed if we allocate the matrices so that they are contiguous // in memory. // TODO: Implement the MTK_DenseMatrix class with this constructor, so that we // do not have to use this routine. inline int idx(const int ii, const int offset, const int jj) { return ii*offset + jj; } // Def. In linear algebra, a VANDERMONDE MATRIX is a matrix with the terms of a // geometric progression in each row. This progression uses the terms of a // given generator vector. // Generates a Vandermonde matrix with a given generator vector gen. double* Vandermonde(const double *gen, // Given generator vector. const int gen_length, // Length of the generator vector. const int pro_length, // Length of the progression. const bool transpose) { // Should create its transpose? double* TT {}; // Output Vandermonde matrix. // Check for the integrity of the arguments: if (gen == nullptr || gen_length < 1 || pro_length < 1) { return nullptr; } try { TT = new double[gen_length*pro_length]; memset(TT, 0.0, sizeof(TT[0])*gen_length*pro_length); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 3 << endl; cerr << memory_allocation_exception.what() << endl; } if (!transpose) { for (auto ii = 0; ii < gen_length; ii++) { for (auto jj = 0; jj < pro_length; jj++) { TT[idx(ii,pro_length,jj)] = pow(gen[ii], (double) jj); } } } else { for (auto ii = 0; ii < pro_length; ii++) { for (auto jj = 0; jj < gen_length; jj++) { TT[idx(ii,gen_length,jj)] = pow(gen[jj], (double) ii); } } } return TT; } // Prints a dense matrix. bool MTK_DenseMatrix_Print(const double* aa, // The matrix. const int num_rows, // Number of rows. const int num_cols) { // Number of columns. if (aa == nullptr || num_rows < 1 || num_cols < 1) { cerr << "ERROR printing matrix of line " << __LINE__ << endl; cerr << "Exiting..." << endl; return false; } for (auto ii = 0; ii < num_rows; ii++) { for (auto jj = 0; jj < num_cols; jj++) { cout << setw(12) << aa[idx(ii,num_cols,jj)]; } cout << endl; } cout << endl; return true; } int main () { // Input: int kk {8}; // Order of numerical accuracy of the operator. // Variables: double* gg {}; // Spatial coordinates for the boundary points. double* AA {}; // Boundary Vandermonde to get the null-space. double* tau {}; // The scalar factors of elementary reflectors for QR. double* work {}; // Work array for dgels_ solver and dgeqrf_ QR factor. int info {0}; // Information regarding solution of the system. int num_bndy_approxs; // Required boundary points for uniform order accuracy. int AA_num_rows; // Number rows for the boundary Vandermonde matrices. int AA_num_cols; // Number cols for the boundary Vandermonde matrices. int AA_ld; // Leading dimension of the boundary Vandermonde matrix. int lwork {-1}; // Length of work array for the dgels_ solver. // Compute amount of boundary points required to achieve uniform accuracy: num_bndy_approxs = (int) (3.0*((double) kk)/2.0); // Generator vector for the next approximation: try { gg = new double[num_bndy_approxs]; memset(gg, 0, sizeof(gg[0])*num_bndy_approxs); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } gg[0] = -1.0/2.0; for (auto ii = 1; ii < num_bndy_approxs; ii++) { gg[ii] = gg[ii - 1] + 1.0; } if (VERBOSE) { cout << "gg =" << endl; for (auto ii = 0; ii < num_bndy_approxs; ii++) { cout << setw(12) << gg[ii]; } cout << endl << endl; } // Before solving this and any other systems for the boundary matrices, we // must obtain the null-space from the first matrix. For this we will follow // recommendations given in: // // http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=5&t=4506 // // We will compute the QR Factorization of the transpose, as in the // following (MATLAB) pseudo-code: // // [Q,R] = qr(A'); % Full QR as defined in // % http://www.stanford.edu/class/ee263/notes/qr_matlab.pdf // null-space = Q(:, last (kk/2 - 1) columns of Q); // // However, given the nature of the Vandermonde matrices we will compute-- // we will compute dim_null of them--they will all posses the same // null-space. Therefore, we impose the convention of computing the // null-space of the first Vandermonde matrix only (for the west boundary). // Vandermonde matrix (using boundary coordinates as the generator): // We compute it NOT transposed, which is the ACTUAL transpose for // LAPACK's QR factor. // Create the Vandermonde matrix for this iteration: AA = Vandermonde(gg, num_bndy_approxs, kk + 1, true); if (AA == nullptr) { cerr << "ERROR constructing matrix at line " << __LINE__ - 2 << endl; cerr << "Exiting..." << endl; return EXIT_FAILURE; } AA_num_rows = kk + 1; AA_num_cols = num_bndy_approxs; AA_ld = AA_num_cols; if (VERBOSE) { // Although it is generated transposed, we print it non-transposed: cout << "AA =" << endl; if (!MTK_DenseMatrix_Print(AA, AA_num_rows, AA_num_cols)) { return EXIT_FAILURE; } } // Prepare to factorize: allocate and inquire for the value of lwork: try { tau = new double[AA_num_cols]; memset(tau, 0, sizeof(tau[0])*AA_num_cols); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } try { work = new double[1]; memset(work, 0, sizeof(work[0])*1); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } info = 0; dgeqrf_(&AA_num_rows, &AA_num_cols, AA, &AA_ld, tau, work, &lwork, &info); if (info == 0) { lwork = (int) work[0]; } else { cerr << "Could not get value for lwork on line " << __LINE__ - 2 << endl; cerr << "Exiting..." << endl; return EXIT_FAILURE; } // Once we know lwork, we can actually invoke the factorization: if (work != nullptr) { delete[] work; work = nullptr; } try { work = new double[lwork]; memset(work, 0, sizeof(work[0])*(lwork)); } catch (bad_alloc &memory_allocation_exception) { cerr << "Memory allocation exception on line " << __LINE__ - 2 << endl; cerr << memory_allocation_exception.what() << endl; } dgeqrf_(&AA_num_cols, &AA_num_rows, AA, &AA_ld, tau, work, &lwork, &info); if (!info) { if (VERBOSE) { cout << "QR factorization successfully completed!" << endl << endl; } } else { cerr << "Something went wrong factoring! info = " << info << endl; } if (VERBOSE) { cout << "AA after QR (elementary reflectors of order M) =" << endl; for (auto ii = 0; ii < AA_num_rows; ii++) { for (auto jj = 0; jj < AA_num_cols; jj++) { cout << setw(12) << AA[idx(ii,AA_num_cols,jj)]; } cout << endl; } cout << endl; } // Garbage collection :) if (gg != nullptr) { delete[] gg; gg = nullptr; } if (AA != nullptr) { delete[] AA; AA = nullptr; } if (tau != nullptr) { delete[] tau; tau = nullptr; } if (work != nullptr) { delete[] work; work = nullptr; } }

It can be built as follows:

Code: Select all: ejspeiro@Eduardo-Alienware-14:~/Dropbox/mimops$ make g++ -std=c++11 -c -g -Wall mwe3.cc gfortran -o mwe3 mwe3.o ~/Libraries/lapack-3.4.1/liblapack.a ~/Libraries/BLAS/libblas.a -lstdc++

And outputs as follows:

Code: Select all: gg = -0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 AA = 1 1 1 1 1 1 1 1 1 1 1 1 -0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 0.25 0.25 2.25 6.25 12.25 20.25 30.25 42.25 56.25 72.25 90.25 110.25 -0.125 0.125 3.375 15.625 42.875 91.125 166.375 274.625 421.875 614.125 857.375 1157.62 0.0625 0.0625 5.0625 39.0625 150.062 410.062 915.062 1785.06 3164.06 5220.06 8145.06 12155.1 -0.03125 0.03125 7.59375 97.6562 525.219 1845.28 5032.84 11602.9 23730.5 44370.5 77378.1 127628 0.015625 0.015625 11.3906 244.141 1838.27 8303.77 27680.6 75418.9 177979 377150 735092 1.3401e+06 -0.0078125 0.0078125 17.0859 610.352 6433.93 37366.9 152244 490223 1.33484e+06 3.20577e+06 6.98337e+06 1.4071e+07 0.00390625 0.00390625 25.6289 1525.88 22518.8 168151 837339 3.18645e+06 1.00113e+07 2.72491e+07 6.6342e+07 1.47746e+08 QR factorization successfully completed! AA after QR (elementary reflectors of order M) = -3.4641 0.224009 0.224009 0.224009 0.224009 0.224009 0.224009 0.224009 0.224009 0.224009 0.224009 0.224009 -17.3205 11.9583 0.14895 0.0832741 0.0175979 -0.0480783 -0.113755 -0.179431 -0.245107 -0.310783 -0.376459 -0.442136 -127.883 119.583 36.5331 0.247348 0.33828 0.380356 0.373575 0.317938 0.213444 0.0600931 -0.142114 -0.393178 -1052.22 1150.98 547.996 -107.624 0.177247 0.0484448 -0.0874815 -0.18914 -0.215138 -0.124085 0.125414 0.574749 -9234.36 11061.4 6573.35 -2152.49 -306.814 0.303125 0.268976 0.0738031 -0.214325 -0.46264 -0.473674 0.0147402 -84356.3 107266 73000.9 -31001.8 -7670.35 840.952 -0.272228 -0.415334 -0.255461 0.126994 0.285204 -0.732966 -792070 1.05056e+06 784038 -391932 -129036 25228.6 2192.48 0.155459 -0.0739444 -0.00699654 0.408327 0.334625 -7.58693e+06 1.03823e+07 8.2843e+06-4.63725e+06-1.83164e+06 485536 76736.9 5356.1 -0.510053 -0.761635 0.0786269 -0.345146 -7.3775e+07 1.03411e+08 8.68011e+07-5.28129e+07-2.37088e+07 7.64813e+06 1.66263e+06 214244 12000.1 0.153881 0.286217 -0.143035

According to Valgrind's memcheck, this mwe is bug free

Code: Select all: (...) ==23068== HEAP SUMMARY: ==23068== in use at exit: 0 bytes in 0 blocks ==23068== total heap usage: 26 allocs, 26 frees, 32,285 bytes allocated ==23068== ==23068== All heap blocks were freed -- no leaks are possible ==23068== ==23068== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ==23068== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

What is my goal? My goal is to get THIS Q:

Code: Select all: Q = -0.288675 -0.288675 -0.288675 -0.288675 -0.288675 -0.288675 -0.288675 -0.288675 -0.288675 -0.288675 -0.288675 -0.288675 -0.459933 -0.376309 -0.292685 -0.209061 -0.125436 -0.0418121 0.0418121 0.125436 0.209061 0.292685 0.376309 0.459933 0.501828 0.228104 0.00912415 -0.155111 -0.2646 -0.319345 -0.319345 -0.2646 -0.155111 0.00912415 0.228104 0.501828 0.459933 -0.0418121 -0.292685 -0.348434 -0.26481 -0.0975616 0.0975616 0.26481 0.348434 0.292685 0.0418121 -0.459933 -0.368767 0.301718 0.368767 0.145272 -0.134097 -0.312893 -0.312893 -0.134097 0.145272 0.368767 0.301718 -0.368767 -0.261608 0.451869 0.166478 -0.229898 -0.348811 -0.15855 0.15855 0.348811 0.229898 -0.166478 -0.451869 0.261608 0.164197 -0.462738 0.164197 0.373176 0.0597081 -0.298541 -0.298541 0.0597081 0.373176 0.164197 -0.462738 0.164197 -0.0904791 0.370142 -0.412914 -0.136541 0.335595 0.230311 -0.230311 -0.335595 0.136541 0.412914 -0.370142 0.0904791 0.0430767 -0.23888 0.466011 -0.254544 -0.289789 0.274124 0.274124 -0.289789 -0.254544 0.466011 -0.23888 0.0430767 -0.00874036 0.0583 -0.146954 0.141249 0.0486822 -0.156543 -0.130333 0.566054 -0.653529 0.386018 -0.119774 0.0155708 0.0117399 -0.0941426 0.322481 -0.603006 0.637649 -0.321721 -0.0245471 0.104354 -0.0153304 -0.0339244 0.0199423-0.00349546 0.0106414 -0.0748491 0.203312 -0.217426 -0.109592 0.578839 -0.66744 0.29945 0.0592101 -0.129478 0.0558632-0.00853182

Which is 12 by 12, because I need to factorize the tranpose of a 9 by 12 matrix A.

How in the world can I recover Q?

Well, so far we have discussed two options:

dorMqr
dorGqr

Using dorMqr, I get the Q I need, but I also get the memory error that originated this thread u.u'

I will keep trying... I will post the mwes to recover Q...

Thanks!

by **ejspeiro** » Mon May 19, 2014 11:40 am

Well, I think I have successfully solved this problem.

I do not know what the problem was... BUT I basically revisited the routine to construct the Vandermonde matrices, while keeping a thorough differentiation of the matrices and their transpose.

I am calling this thread as solved

If anybody is interested in how does the code looks like, please let me know.

The Q matrix (as discussed in previous threads), was used to attain the kernel. Once the kernel was attained, I had to scale it. Results look like this:

Thanks to everybody involved on this thread

\m/

LAPACK/ScaLAPACK Development

Valgrind invalid read/write when using the LAPACK from C++11

Valgrind invalid read/write when using the LAPACK from C++11

Re: Valgrind invalid read/write when using the LAPACK from C

Re: Valgrind invalid read/write when using the LAPACK from C

Re: Valgrind invalid read/write when using the LAPACK from C

Re: Valgrind invalid read/write when using the LAPACK from C

Re: Valgrind invalid read/write when using the LAPACK from C

Re: Valgrind invalid read/write when using the LAPACK from C

Re: Valgrind invalid read/write when using the LAPACK from C

Who is online