For each square coefficient matrix A of a, S3L_lu_solve solves a system of distributed linear equations AX = B, with a general M x M square matrix instance A, using the LU factorization computed by S3L_lu_factor.
Throughout these descriptions, L-1 and U-1 denote the inverse of L and U, respectively.
A and B are corresponding instances within a and b, respectively. To solve AX = B, S3L_lu_solve performs forward elimination:
Let UX = C A = LU implies that AX = B is equivalent to C = L-1B |
followed by back substitution:
X = U-1C = U-1(L-1B) |
To obtain this solution, the S3L_lu_solve routine performs the following steps:
Upon successful completion, each B is overwritten with the solution to AX = B.
In general, S3L_lu_solve performs most efficiently when the array is distributed using the same block size along each axis.
S3L_lu_solve behaves somewhat differently for 3D arrays, however. In this case, the nodal solve is applied on each of the 2D systems AX=B across the instance axis of a and is performed concurrently on all participating processes.
The input parallel arrays a and b must be distinct.
The internal variable setup_id is required for communicating information between the factorization routine and the other LU routines. The application must not modify the contents of this variable.
The C and Fortran syntax for S3L_lu_solve are shown below.
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_lu_solve(b, a, setup_id) S3L_array_t b S3L_array_t a int setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_lu_solve(b, a, setup_id, ier) integer*8 b integer*8 a integer*4 setup_id integer*4 ier |
b - Parallel array of the same type (real or complex) and precision as a. Must be distinct from a. The instance axes of b must match those of a in order of declaration and extents. The rows and columns of each B must be counted by axes row_axis and col_axis, respectively (from the S3L_lu_factor call). For the two-dimensional case, if b consists of only one right-hand side vector, you can represent b as a vector (an array of rank 1) or as an array of rank 2 with the number of columns set to 1 and the elements counted by axis row_axis.
a - Parallel array that was factored by S3L_lu_factor, where each matrix instance A is a dense M x M square matrix. Supply the same value a that was used in S3L_lu_factor.
setup_id - Scalar integer variable. Use the value returned by the corresponding S3L_lu_factor call for this argument.
This function uses the following arguments for output:
b - Upon successful completion, each matrix instance B is overwritten with the solution to AX = B.
ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.
On success, S3L_lu_solve returns S3L_SUCCESS.
S3L_lu_solve performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.
The following conditions will cause the function to terminate and return the associated error code:
S3L_ERR_ARG_NULL - Invalid array. b must be preallocated and the same value returned by S3L_lu_factor must be supplied in a.
S3L_ERR_ARG_RANK - Invalid rank. For cases where rank >= 3, rank(b) must equal rank(a). For the two-dimensional case, rank(b) must be either 1 or 2.
S3L_ERR_ARG_DTYPE - Invalid data type; must be real or complex (single- or double-precision).
S3L_ERR_ARG_BLKSIZE - Invalid block size; must be >= 1.
S3L_ERR_MATCH_EXTENTS - Extents of a and b are mismatched along the row or instance axis.
S3L_ERR_MATCH_DTYPE - Unmatched data type between a and b.
S3L_ERR_ARRNOTSQ - Invalid matrix size; each coefficient matrix must be square.
S3L_ERR_ARG_SETUP - Invalid setup_id value. It does not match the value returned by S3L_lu_factor.
../examples/s3l/lu/lu.c ../examples/s3l/lu/ex_lu1.c ../examples/s3l/lu/ex_lu2.c ../examples/s3l/lu-f/lu.f ../examples/s3l/lu-f/ex_lu1.f
S3L_lu_deallocate(3) S3L_lu_factor(3) S3L_lu_invert(3)