The LAPACK forum has moved to https://github.com/Reference-LAPACK/lapack/discussions.

Performance gains in customizing SVD function?

Open discussion regarding features, bugs, issues, vendors, etc.

Performance gains in customizing SVD function?

Postby stevekuo » Mon May 03, 2010 4:42 pm

Hi all,

I am working on a project where we are executing an SVD around 40K times per second and we only need the last row of VT (with complex numbers). We don't need U or Sigma.

Given that our input matrix dimensions are pretty small (14x16 / 15x16) or less, does anyone have an intuitive feeling of the sort of gain that we could get by customizing the SVD function to only compute what we need?

Nobody on my side has a deep understanding of how VT gets calculated in the LAPACK SVD functions so it's difficult to get an order of magnitude / percentage guesstimate on how much time we could gain back by trying to do this (or hiring someone to do this).

Currently, our setup is running a 7x8 complex SVD in 58us average. Our pie in the sky dream would be to get this down by a factor of 10, but we'll take whatever gains we can get. I'm not sure how much of that 58us is overhead and how much is used to actually compute the output. My plan it to try and trim the fat off the SVD function and do some multi-threading to get as close as possible to the target.

Can anyone help shed light on this?

steve
stevekuo
 
Posts: 4
Joined: Wed Mar 31, 2010 7:41 pm

Re: Performance gains in customizing SVD function?

Postby Julien Langou » Sat May 08, 2010 9:37 pm

Hello Steve,

You can indeed gain a little by tweaking LAPACK if you only want the last row of VT.
To give you an order of idea, you are still going to be an O(n^3) algorithm.
I do not have a good count but say you can gain a factor of two or three in the number of flops there.
I am not sure it is straightforward to do with the LAPACK code the way it is now.

I am not sure either how much it is realistic to hope for gaining anything by multithreading the code.
The matrices are 7x8, 14x16, 15x16 ... There is not enough parallelism here to trade against moving the data out of your cache.

If I understand you well, you are looking for one vector in the nullspace of a 7x8 matrix.
That is a vector of size 8x1 such that A * v = 0. When the matrix is 14x16 do you look for two vectors or just one?
Are your matrix full rank? ( When the matrix is 14x16, is its rank 14? when the matrix is 7x8, is its rank 7? )
(Please answer because I might not understand well.)

In any case, you might consider a different approach than LAPACK and go with iterative methods. You might get your speedup of 10 there ...
Maybe.

Julien.
Julien Langou
 
Posts: 835
Joined: Thu Dec 09, 2004 12:32 pm
Location: Denver, CO, USA


Return to User Discussion

Who is online

Users browsing this forum: No registered users and 7 guests