Sun S3L provides 18 matrix multiplication routines that compute one or more instances of matrix products. For each instance, these routines perform the operations listed in Table 8-5.
In these descriptions, AT and AH denote A transpose and A Hermitian, respectively.
Routine |
Operation |
Data Type |
---|---|---|
S3L_mat_mult | C = C + AB |
real or complex |
S3L_mat_mult_noadd | C = AB |
real or complex |
S3L_mat_mult_addto | C = D + AB |
real or complex |
S3L_mat_mult_t1 | C = C + ATB |
real or complex |
S3L_mat_mult_t1_noadd | C = ATB |
real or complex |
S3L_mat_mult_t1_addto | C = D + ATB |
real or complex |
S3L_mat_mult_h1 | C = C + AHB |
complex only |
S3L_mat_mult_h1_noadd | C = AHB |
complex only |
S3L_mat_mult_h1_addto | C = D + AHB |
complex only |
S3L_mat_mult_t2 |
C = C + ABT |
real or complex |
S3L_mat_mult_t2_noadd | C = ABT |
real or complex |
S3L_mat_mult_t2_addto | C = D + ABT |
real or complex |
S3L_mat_mult_h2 | C = C + ABH |
complex only |
S3L_mat_mult_h2_noadd | C = ABH |
complex only |
S3L_mat_mult_h2_addto | C = D + ABH |
complex only |
S3L_mat_mult_t1_t2 |
C = C + ATBT |
real or complex |
S3L_mat_mult_t1_t2 |
C = C + ATBT |
real or complex |
S3L_mat_mult_t1_t2_noadd | C = ATBT |
real or complex |
S3L_mat_mult_t1_t2_addto | C = D + ATBT |
real or complex |
The algorithm used depends on the axis lengths of the variables supplied.
For calls that do not transpose either matrix A or B, the variables conform correctly with the axis lengths for row_axis and col_axis shown in Table 8-6.
Table 8-6 Recommended row_axis and col_axis Values When Matrix A and Matrix B Are Not Transposed
Variable |
row_axis Length |
col_axis Length |
---|---|---|
A |
p |
q |
B |
q |
r |
C |
p |
r |
D |
p |
r |
For calls that transpose the matrix A (AT), the variables conform correctly with the axis lengths for row_axis and col_axis shown in Table 8-7.
Table 8-7 Recommended row_axis and col_axis Values When Matrices Are Transposed
Variable |
row_axis Length |
col_axis Length |
---|---|---|
A |
q |
p |
B |
q |
r |
C |
p |
r |
D |
p |
r |
For calls that transpose the matrix B (BT), the variables conform correctly with the axis lengths for row_axis and col_axis shown in Table 8-8.
Table 8-8 Recommended row_axis and col_axis Values When Matrix B Is Transposed
Variable |
row_axis Length |
col_axis Length |
---|---|---|
A |
q |
q |
B |
r |
q |
C |
p |
r |
D |
p |
r |
For calls that transpose both A and B (ATBT), the variables conform correctly with the axis lengths for row_axis and col_axis shown in Table 8-9.
Table 8-9 Recommended row_axis and col_axis Values When Both Matrix A and Matrix B Are Transposed
Variable |
row_axis Length |
col_axis Length |
---|---|---|
A |
q |
p |
B |
r |
q |
C |
p |
r |
D |
p |
r |
The algorithm is numerically stable.
The C and Fortran syntax for S3L_mat_mult are shown below.
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_mat_mult(C, A, B, row_axis, col_axis) S3L_mat_mult_noadd(C, A, B, row_axis, col_axis) S3L_mat_mult_addto(C, A, B, D, row_axis, col_axis) S3L_mat_mult_t1(C, A, B, row_axis, col_axis) S3L_mat_mult_t1_noadd(C, A, B, row_axis, col_axis) S3L_mat_mult_t1_addto(C, A, B, D, row_axis, col_axis) S3L_mat_mult_h1(C, A, B, row_axis, col_axis) S3L_mat_mult_h1_noadd(C, A, B, row_axis, col_axis) S3L_mat_mult_h1_addto(C, A, B, D, row_axis, col_axis) S3L_mat_mult_t2(C, A, B, row_axis, col_axis) S3L_mat_mult_t2_noadd(C, A, B, row_axis, col_axis) S3L_mat_mult_t2_addto(C, A, B, D, row_axis, col_axis) S3L_mat_mult_h2(C, A, B, row_axis, col_axis) S3L_mat_mult_h2_noadd(C, A, B, row_axis, col_axis) S3L_mat_mult_h2_addto(C, A, B, D, row_axis, col_axis) S3L_mat_mult_t1_t2(C, A, B, row_axis, col_axis) S3L_mat_mult_t1_t2_noadd(C, A, B, row_axis, col_axisb) S3L_mat_mult_t1_t2_addto(C, A, B, D, row_axis, col_axis) S3L_array_t C S3L_array_t A S3L_array_t B S3L_array_t D int row_axis int col_axis |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_mat_mult(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_noadd(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_addto(C, A, B, D, row_axis, col_axis, ier) S3L_mat_mult_t1(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_t1_noadd(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_t1_addto(C, A, B, D, row_axis, col_axis, ier) S3L_mat_mult_h1(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_h1_noadd(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_h1_addto(C, A, B, D, row_axis, col_axis, ier) S3L_mat_mult_t2(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_t2_nodto(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_t2_addto(C, A, B, D, row_axis, col_axis, ier) S3L_mat_mult_h2(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_h2_noadd(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_h2_addto(C, A, B, D, row_axis, col_axis, ier) S3L_mat_mult_t1_t2(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_t1_t2_noadd(C, A, B, row_axis, col_axisb, ier) S3L_mat_mult_t1_t2_addto(C, A, B, D, row_axis, col_axis, ier) integer*8 C integer*8 A integer*8 B integer*8 D integer*4 row_axis integer*4 col_axis integer*4 ier |
C - Array handle for an S3L parallel array of rank >= 2. C is the destination array for all matrix multiplication operations (as discussed in the Output section). Some of these operations also use C as an input argument, adding the contents of C to their respective matrix multiplication products. The operations shown in Table 8-5 that include some variation of C + AB belong to this class.
A - Array handle for an S3L parallel array of the same rank as C and B. This array contains one or more instances of the left-hand factor array A, defined by axes row_axis (which counts the rows) and col_axis (which counts the columns). Axis col_axis of A must have the same length as axis row_axis of B. The contents of A are not changed during execution.
B - Array handle for an S3L parallel array of the same rank as C and A. This array contains one or more instances of the right-hand factor array B, defined by axes row_axis (which counts the rows) and col_axis (which counts the columns). The contents of B are not changed during execution.
D - Parallel array of the same shape as C. This argument is used only in the calls whose names end in _addto. It contains one or more instances of the array D that is to be added to the array product, defined by axes row_axis (which counts the rows) and col_axis (which counts the columns). The contents of D are not changed during execution, unless D and C are the same variable.
row_axis - The axis of C, A, and B that counts the rows of the embedded array or arrays. Must be nonnegative and less than the rank of C.
col_axis - The axis of C, A, and B that counts the columns of the embedded array or arrays. Must be nonnegative and less than the rank of C.
Note: The argument can be identical with the argument C in all matrix multiply _addto routines except _t1_t2_addto.
These functions use the following arguments for output:
C - Array handle for an S3L parallel array, which is a destination array for all matrix multiplication operations. Upon successful completion, each array instance within C is overwritten by the result of the array multiplication call.
ier (Fortran only) - When called from a Fortran program, these functions return error status in ier.
On success, the S3L_mat_mult routines return S3L_SUCCESS.
The S3L_mat_mult routines perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause these functions to terminate and return the associated error code:
S3L_ERR_MATCH_RANK - The parallel arrays do not have the same rank.
S3L_ERR_MATCH_EXTENTS - The lengths of corresponding axes do not match.
S3L_ERR_MATCH_DTYPE - The arguments are not all of the same data type and precision.
S3L_ERR_ARG_AXISNUM - row_axis and/or col_axis contains a bad axis number. For C program calls, each of these parameters must be >= 0 and less than the rank of C. For Fortran calls, they must be >= 1 and <= the rank of C.
S3L_ERR_CONJ_INVAL - Conjugation was requested, but data supplied was not of type S3L_complex_t or S3L_dcomplex_t.
../examples/s3l/dense_matrix_ops/matmult.c ../examples/s3l/dense_matrix_ops-f/matmult.f
S3L_inner_prod(3) S3L_2_norm(3) S3L_outer_prod(3) S3L_mat_vec_mult(3)