((J^T)J + uI)^(-1), where, u is a constant, I is the identity matrix and J is (m x n).
After going through this forum and other websites, I understand that there is a difference between the matrix in C (row-major) and FORTRAN (column-major).
Since my main algorithm is in C, I am stuck with the following dilemma:
1. Store the data in the matrix in row major (default in C), convert it into column major using a custom algorithm in C, pass it to the CLAPACK routines , and convert the result back into row major.
2. Store the data in the matrix in row major (default in C), transpose it using some CLAPACK routine (what’s the routine to just transpose the matrix?), pass it to the required CLAPACK routines for the main operation, and then transpose the result (using CLAPCK) back into row major.
3. Store the data in the matrix as column major; pass it to the required CLPACK routine for the main operation.
Method (1) and (2) have some overhead involved in transposing the matrices before the main operation.
Method (3) however doesn’t have any overhead involved in transposing the matrices, but I am slightly concerned with any overhead involved with the way the data is written in memory and accessed. Normally accessing memory in sequence is much quicker than having to jump over certain addresses
Which method would be the best in terms of speed?
I am developing the algorithm on a MacBook Pro (i7 2.3 GHz) using the accelerate framework, provided by apple. Matrix J would be of the following dimension: (n) x (m), where n (row) would be between 2 and 4 and m (column) would be more than 150.
I would be really appreciated if anyone could advise me on this
Saed

