Hi all,
I am working on a project where we are executing an SVD around 40K times per second and we only need the last row of VT (with complex numbers). We don't need U or Sigma.
Given that our input matrix dimensions are pretty small (14x16 / 15x16) or less, does anyone have an intuitive feeling of the sort of gain that we could get by customizing the SVD function to only compute what we need?
Nobody on my side has a deep understanding of how VT gets calculated in the LAPACK SVD functions so it's difficult to get an order of magnitude / percentage guesstimate on how much time we could gain back by trying to do this (or hiring someone to do this).
Currently, our setup is running a 7x8 complex SVD in 58us average. Our pie in the sky dream would be to get this down by a factor of 10, but we'll take whatever gains we can get. I'm not sure how much of that 58us is overhead and how much is used to actually compute the output. My plan it to try and trim the fat off the SVD function and do some multi-threading to get as close as possible to the target.
Can anyone help shed light on this?
steve

