For each M x N coefficient matrix A of a, S3L_lu_factor computes the LU factorization using partial pivoting with row interchanges.
The factorization has the form A = P x L x U, where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if M > N), and U is upper triangular (upper trapezoidal if M < N). L and U are stored in A.
In general, S3L_lu_factor performs most efficiently when the array is distributed using the same block size along each axis.
S3L_lu_factor behaves somewhat differently for 3D arrays, however. In this case, it applies nodal LU factorization on each M x N coefficient matrix across the instance axis. This factorization is performed concurrently on all participating processes.
You must call S3L_lu_factor before calling any of the other LU routines. The S3L_lu_factor routine performs on the preallocated parallel array and returns a setup ID. You must supply this setup ID in subsequent LU calls, as long as you are working with the same set of factors.
Be sure to call S3L_deallocate_lu when you have finished working with a set of LU factors. See "S3l_lu_deallocate " for details.
The internal variable setup_id is required for communicating information between the factorization routine and the other LU routines. The application must not modify the contents of this variable.