Progressive Optimization of Batched LU Factorization on GPUs