Cosine/Sine decomposition subroutines in their APIs have special parameter TRANS which is treated as layout (column major or row major) of matrices in memory. For example,
SUBROUTINE DORBDB( TRANS, SIGNS, M, P, Q, X11, LDX11, X12, LDX12,
X21, LDX21, X22, LDX22, THETA, PHI, TAUP1,
TAUP2, TAUQ1, TAUQ2, WORK, LWORK, INFO )
LAPACKE interfaces have additional parameter matrix_order that play the same role:
lapack_int LAPACKE_dorbdb_work( int matrix_order, char trans, char signs,
lapack_int m, lapack_int p, lapack_int q,
double* x11, lapack_int ldx11, double* x12,
lapack_int ldx12, double* x21, lapack_int ldx21,
double* x22, lapack_int ldx22, double* theta,
double* phi, double* taup1, double* taup2,
double* tauq1, double* tauq2, double* work,
lapack_int lwork )
This mistake might mislead users - intuitive workaround (setting the same value for both parameters) would work not always because in a case of matrix_order=ROW_MAJOR LAPACKE_dorbdb_work will transpose the input data and call Fortran interface. The Fortran interface having trans='T' will transpose the data once again.
The issue relates to all 12 subroutines:
LAPACKE_?orbdb (2 routines)
LAPACKE_?unbdb (2 routines)
LAPACKE_?orcsd (2 routines)
LAPACKE_?uncsd (2 routines)
LAPACKE_?bbcsd (4 routines)

