LAPACK/ScaLAPACK Development

by **ganders** » Tue Aug 19, 2008 5:52 am

Dear all,

I already found a similiar topic

"Problems with results for SGELS for least squares in C++"

for determining a "straight least squares line":
I have a couple of 3D points (x_i, y_i, z_i) and I set up the following matrices like described in the topic mentioned above:

A = (1, x_i, y_i)
b = (z_i)

What I need are the coefficients (u, v, w) for the line z = v*x + w*y + u. Afaik, using DGELS these three coefficients are stored in the
output of b[0] = u, b[1] = v, b[2] = w, right? Thus, e.g. setting x1=0, y1=1 and x2=1, y2=1 I obtain two points laying on the straight least squares line.

I was checking my 3D points with the line obtained, and it looks "good", but cross-checking it with a different approach of calculations I also obtained
another straight line that is significantly different. Here is the code I use:

Code: Select all: #include <iostream> #include <iomanip> using namespace std; extern "C" void dgels_(char *trans, int *m, int *n, int *nrhs, double *a, int *lda, double *b, int *ldb, double *work, int *lwork, int *info); int main(int argv, char** argc) { char trans = 'N'; int m = 5; int n = 3; int nrhs = 1; int lda = m; // LDA >= max(1,M) int ldb = m; // LDB >= MAX(1,M,N) int lwork = 8; // LWORK >= max( 1, min(M, N) + max( min(M, N), NRHS ) ) int info; double work[8]; // max(1, LWORK) double a[15]; double b[5]; // for dlassq int tmp = m - n; int incx = 1; double scale; double sumsq; // first column a[0] = 1; a[1] = 1; a[2] = 1; a[3] = 1; a[4] = 1; // cartn_x column a[5] = 0.535; a[6] = 2.165; a[7] = 2.393; a[8] = 3.222; a[9] = 0.961; // cartn_y column a[10] = -0.503; a[11] = -0.420; a[12] = 2.527; a[13] = 3.152; a[14] = 4.609; // cartn_z column b[0] = -11.613; b[1] = -8.169; b[2] = -5.757; b[3] = -2.087; b[4] = 0.614; cout << "input A:" << endl; for(int i=0; i < m; i++) { for(int j=0; j < n*m; j = j + m) { cout << setprecision(5) << setw(10) << a[j + i]; } cout << ""<< endl; } cout << "------------------------------------------------------------\n"<< endl; dgels_(&trans,&m,&n,&nrhs,a,&lda,b,&ldb,work,&lwork,&info); if (info == 0) { cout << "output (after dgels): first column, second column, ..."<< endl; for(int i=0; i < m; i++) { cout << "i: " << i << " " << "no col: " << n << setprecision(5) << setw(8) << b[i] << endl; } } else cout << "Problem!!! info = " << info << "\n"; cout << "------------------------------------------------------------\n"<< endl; return 0; }

Output is:

b[0] = -9.9631
b[1] = 0.45284
b[2] = 1.9864

Is this the right approach of determining the straight line coordinates?
Thanks a lot in advance,

Gerd

by **Julien Langou** » Thu Aug 28, 2008 12:16 pm

Hello, I never answered this post, sorry about that. Here a quick answer. I hope to be able to answer more later during the week ... Your usage and understanding of the output LAPACK xDGELS is good. Now you need to understand what you are minimizing. Solving a linear least squares problem to find the equation of your line is one way of solving your problem. You are correctly doing it. However there are various choices. I think a good way to go for this problem is to solve a Total Least Squares problem and not a Linear Least Squares problem. To understand the difference, you can have a look at:

Code: Select all: I. Markovsky and S. Van Huffel, Overview of total least squares methods, Signal Processing, Volume 87, pages 2283-2302, 2007. ftp://ftp.esat.kuleuven.ac.be/pub/SISTA/markovsky/reports/05-34.pdf

in particular Figure 1 Page 3.
Best wishes,
Julien.

by **ganders** » Mon Sep 01, 2008 3:18 am

Hello Julien,
thanks for your reply and hint. I think methodically the total least squares approach might fit better to my 'problem': I need the least squares fit line actually for a simplification of helices and strands of proteins, representing them as vector each. Since helices are quite often bended I want to use the least squares approach for determining the representing vector. Due to the bending I guess I do not really not 'the' least squares line, 'any' of them might be fine. Nevertheless, I have seen that there is already a NETLIB implementation in FORTRAN for the total least squares method, but I haven't seen something similiar in LAPACK so far.
BTW: I found a similiar source code to my problem, where the author(s) used a method for determining the sum of squares errors (sse): DLASSQ. If this is the right one, I would like to use it temporarily for validiting my results and getting a better 'feeling' for them.
Any hint is welcome.

Thanks a lot in advance,

Gerd

LAPACK/ScaLAPACK Development

how to interprete output of xDGELS

how to interprete output of xDGELS

Re: how to interprete output of xDGELS

Re: how to interprete output of xDGELS

Who is online