Multiple-Instance 2-norm - The multiple-instance 2-norm routine, S3L_2_norm, computes one or more instances of the 2-norm of a vector. The single-instance 2-norm routine, S3L_gbl_2_norm, computes the global 2-norm of a parallel array.
For each instance z of z, the multiple-instance routine S3L_2_norm performs the operation shown in Table 8-1.
Table 8-1 S3L Multiple-Instance 2-norm Operations
Operation |
Data Type |
---|---|
z = (xTx)1/2 = ||x||(2) |
real |
z = (xHx)1/2 = ||x||(2) |
complex |
Upon successful completion, S3L_2_norm overwrites each element of z with the 2-norm of the corresponding vector in x.
Single-Instance 2-norm - The single-instance routine S3L_gbl_2_norm routine performs the operations shown in Table 8-2.
Table 8-2 S3L Single-Instance 2-norm Operations
Operation |
Data Type |
---|---|
a = (xTx)1/2 = ||x||(2) |
real |
a = (xHx)1/2 = ||x||(2) |
complex |
Upon successful completion, S3L_gbl_2_norm overwrites a with the global 2-norm of x.
The C and Fortran syntax for S3L_2_norm and S3L_gbl_2_norm are shown below.
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_2_norm(z, x, x_vector_axis) S3L_gbl_2_norm(a, x) S3L_array_t a S3L_array_t z S3L_array_t x int x_vector_axis |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_2_norm(z, x, ier) S3L_gbl_2_norm(a, x, ier) integer*8 a integer*8 z integer*8 x integer*4 x_vector_axis integer*4 ier |
x - Array handle for an S3L parallel array. For calls to S3L_2_norm (multiple-instance routine), x must represent a parallel array of rank >= 2, with at least one nonlocal instance axis. The variable x contains one or more instances of the vector x whose 2-norm will be computed.
For calls to S3L_gbl_2_norm (single-instance routine), x must represent a parallel array of rank >= 1, with any instance axes declared to have length 1.
x_vector_axis - Scalar variable. Identifies the axis of x along which the vectors lie.
These functions use the following argument for output:
z - Array handle for the S3L parallel array that will contain the results of the multiple-instance 2-norm routine. The rank of z must be one less than that of x. The axes of z must match the instance axes of x in length and order of declaration. Thus, each vector x in x corresponds to a single destination value z in z.
a - Pointer to a scalar variable. Destination for the single-instance 2-norm routine.
ier (Fortran only) - When called from a Fortran program, these functions return error status in ier.
On success, S3L_2_norm and S3L_gbl_2_norm return S3L_SUCCESS.
S3L_2_norm and S3L_gbl_2_norm perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the functions to terminate and return the associated error code.
S3L_ERR_ARG_RANK - x has invalid rank.
S3L_ERR_ARG_AXISNUM - (S3L_2_norm only) x_vector_axis is a bad axis number. For C program calls, this parameter must be >= 0 and less than the rank of x. For Fortran program calls, it must be >= 1 and not exceed the rank of x.
S3L_ERR_MATCH_RANK - z does not have a rank of one less than that of x.
../examples/s3l/dense_matrix_ops/norm2.c ../examples/s3l/dense_matrix_ops-f/norm2.f
S3L_inner_prod(3) S3L_outer_prod(3) S3L_mat_vec_mult(3) S3L_mat_mult(3)
Multiple-Instance Inner Product - Sun S3L provides six multiple-instance inner product routines, all of which compute one or more instances of the inner product of two vectors embedded in two parallel arrays. The operations performed by the multiple-instance inner product routines are shown in Table 8-3.
Table 8-3 S3L Multiple-Instance Inner Product Operations
Routine |
Operation |
Data Type |
---|---|---|
S3L_inner_prod |
z = z + xTy |
real or complex |
S3L_inner_prod_noadd |
z = xTy |
real or complex |
S3L_inner_prod_addto |
z = u + xTy |
real or complex |
S3L_inner_prod_c1 |
z = z + xHy |
complex only |
S3L_inner_prod_c1_noadd |
z = xHy |
complex only |
S3L_inner_prod_c1_addto |
z = u + xHy |
complex only |
For these multiple-instance operations, array x contains one or more instances of the first vector in each inner-product pair, x. Likewise, array y contains one or more instances of the second vector in each pair, y.
The array arguments x, y, and so forth. actually represent array handles that describe S3L parallel arrays. For convenience, however, this discussion ignores that distinction and refers to them as if they were the arrays themselves.
x and y must be at least rank 1 arrays, must be of the same rank, and their corresponding axes must have the same extents. Additionally, x and y must both be distributed arrays--that is, each must have at least one axis that is nonlocal.
Array z, which stores the results of the multiple-instance inner product operations, must be of rank one less than that of x and y. Its axes must match the instance axes of x and y in length and order of declaration and it must also have at least one axis that is nonlocal. This means each vector pair in x and y corresponds to a single destination value in z.
For S3L_inner_prod and S3L_inner_prod_c1, z is also used as the source for a set of values, which are added to the inner products of the corresponding x and y vector pairs.
Finally, x, y, and z must match in data type and precision.
Two scalar integer variables, x_vector_axis and y_vector_axis, specify the axes of x and y along which the constituent vectors in each vector pair lie.
When specifying values for x_vector_axis and y_vector_axis, keep in mind that Sun S3L functions employ zero-based array indexing when they are called via the C/C++ interface and one-based indexing when called via the F77/F90 interface.
The array handle u describes an S3L parallel array that is used by S3L_inner_prod_addto and S3L_inner_prod_c1_addto. These routines add the values contained in u to the inner products of the corresponding x and y vector pairs.
Upon successful completion of S3L_inner_prod or S3L_inner_prod_c1, the inner product of each vector pair x and y in x and y, respectively, is added to the corresponding value in z.
Upon successful completion of S3L_inner_prod_noadd or S3L_inner_prod_c1_noadd, the inner product of each vector pair x and y in x and y, respectively, overwrites the corresponding value in z.
Upon successful completion of S3L_inner_prod_addto or S3L_inner_prod_c1_addto, the inner product of each vector pair x and y in x and y respectively, is added to the corresponding value in u, and each resulting sum overwrites the corresponding value in z.
If the instance axes of x and y--that is, the axes along which the inner product will be taken--each contains only a single vector, either declare the axes to have an extent of 1 or use the comparable single-instance inner product routine, as described below.
Single-Instance Inner Product - Sun S3L also provides six single-instance inner product routines, all of which compute the inner product over all the axes of two parallel arrays. The operations performed by the single-instance inner product routines are shown in Table 8-4.
Table 8-4 S3L Single-Instance Inner Product Operations
Routine |
Operation |
Data Type |
---|---|---|
S3L_gbl_inner_prod |
a = a + xTy |
real or complex |
S3L_gbl_inner_prod_noadd |
a = xTy |
real or complex |
S3L_gbl_inner_prod_addto |
a = b + xTy |
real or complex |
S3L_gbl_inner_prod_c1 |
a = a + xHy |
complex only |
S3L_gbl_inner_prod_c1_noadd |
a = xHy |
complex only |
S3L_gbl_inner_prod_c1_addto |
a = b + xHy |
complex only |
For these single-instance functions, x and y are S3L parallel arrays of rank 1 or greater and with the same data type and precision.
a is a pointer to a scalar variable of the same data type as x and y. This variable stores the results of the single-instance inner product operations.
For S3L_gbl_inner_prod and S3L_gbl_inner_prod_c1, a is also used as the source for a set of values, which are added to the inner product of x and y.
b is also a pointer to a scalar variable of the same data type as x and y. It contains a set of values that S3L_gbl_inner_prod_addto and S3L_gbl_inner_prod_c1_addto add to the inner product of x and y.
Upon successful completion of S3L_gbl_inner_prod or S3L_gbl_inner_prod_c1, the global inner product of x and y is added to a.
Upon successful completion of S3L_gbl_inner_prod_noadd or S3L_gbl_inner_prod_c1_noadd, the global inner product of x and y overwrites a.
Upon successful completion of S3L_gbl_inner_prod_addto or S3L_gbl_inner_prod_c1_addto, the global inner product of x and y is added to b, and the resulting sum overwrites a.
Array variables must not overlap.
The C and Fortran syntax for S3L_inner_prod and S3L_gbl_inner_prod are shown below.
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_inner_prod(z, x, y, x_vector_axis, y_vector_axis) S3L_inner_prod_noadd(z, x, y, x_vector_axis, y_vector_axis) S3L_inner_prod_addto(z, x, y, *u, x_vector_axis, y_vector_axis) S3L_inner_prod_c1(z, x, y, x_vector_axis, y_vector_axis) S3L_inner_prod_c1_noadd(z, x, y, x_vector_axis, y_vector_axis) S3L_inner_prod_c1_addto(z, x, y, *u, x_vector_axis, y_vector_axis) S3L_gbl_inner_prod(a, x, y) S3L_gbl_inner_prod_noadd(a, x, y) S3L_gbl_inner_prod_addto(a, x, y, b) S3L_gbl_inner_prod_c1(a, x, y) S3L_gbl_inner_prod_c1_noadd(a, x, y) S3L_gbl_inner_prod_c1_addto(a, x, y, b) S3L_array_t z S3L_array_t x S3L_array_t y S3L_array_t u S3L_array_t a S3L_array_t b int x_vector_axis int y_vector_axis |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_inner_prod(z, x, y, x_vector_axis, y_vector_axis, ier) S3L_inner_prod_noadd(z, x, y, x_vector_axis, y_vector_axis, ier) S3L_inner_prod_addto(z, x, y, *u, x_vector_axis, y_vector_axis, ier) S3L_inner_prod_c1(z, x, y, x_vector_axis, y_vector_axis, ier) S3L_inner_prod_c1_noadd(z, x, y, x_vector_axis, y_vector_axis, ier) S3L_inner_prod_c1_addto(z, x, y, *u, x_vector_axis, y_vector_axis, ier) S3L_gbl_inner_prod(a, x, y, ier) S3L_gbl_inner_prod_noadd(a, x, y) S3L_gbl_inner_prod_addto(a, x, y, b) S3L_gbl_inner_prod_c1(a, x, y) S3L_gbl_inner_prod_c1_noadd(a, x, y) S3L_gbl_inner_prod_c1_addto(a, x, y, b) S3L_array_t z S3L_array_t x S3L_array_t y S3L_array_t u S3L_array_t a S3L_array_t b int x_vector_axis int y_vector_axis int ier |
z - Array handle for an S3L parallel array, which S3L_inner_prod and S3L_inner_prod_c1 use as a source of values to be added to the inner products of the corresponding x and y vector pairs. z is also used for output; see the Output section for details.
x - Array handle for an S3L parallel array that contains the first vector in each vector pair for which an inner product will be computed.
y - Array handle for an S3L parallel array that contains the second vector in each vector pair for which an inner product will be computed.
u - Array handle for an S3L parallel array whose rank is one less than that of x and y. S3L_inner_prod_addto and S3L_inner_prod_c1_addto add the contents of u to the inner products of the corresponding vector pairs of x and y.
a - Pointer to a scalar variable, which S3L_gbl_inner_prod and S3L_gbl_inner_prod_c1 use as source of values to be added to the inner product of x and y. a is also used for output; see the Output section for details.
b - Pointer to a scalar variable, which S3L_gbl_inner_prod_addto and S3L_gbl_inner_prod_c1_addto use as source of values to be added to the inner product of x and y.
x_vector_axis - Scalar variable. Identifies the axis of x along which the vectors lie.
y_vector_axis - Scalar variable. Identifies the axis of y along which the vectors lie.
These functions use the following arguments for output:
z - Array handle for the S3L parallel array that will contain the results of the multiple-instance 2-norm routine.
a - Pointer to a scalar variable, which is the destination for the single-instance inner product routines.
ier (Fortran only) - When called from a Fortran program, these functions return error status in ier.
On success, S3L_inner_prod and S3L_gbl_inner_prod return S3L_SUCCESS.
S3L_inner_prod and S3L_gbl_inner_prod perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated error code:
S3L_ERR_MATCH_RANK - x and y do not have the same rank.
S3L_ERR_MATCH_EXTENTS - Axes of x and y do not have the same extents.
S3L_ERR_MATCH_DTYPE - The arguments are not all of the same data type and precision.
S3L_ERR_CONJ_INVAL - Conjugation was requested, but data supplied was not of type S3L_complex_t or S3L_dcomplex_t.
../examples/s3l/dense_matrix_ops/inner_prod.c ../examples/s3l/dense_matrix_ops-f/inner_prod.f
S3L_2_norm(3) S3L_outer_prod(3) S3L_mat_vec_mult(3) S3L_mat_mult(3)
Sun S3L provides 18 matrix multiplication routines that compute one or more instances of matrix products. For each instance, these routines perform the operations listed in Table 8-5.
In these descriptions, AT and AH denote A transpose and A Hermitian, respectively.
Routine |
Operation |
Data Type |
---|---|---|
S3L_mat_mult | C = C + AB |
real or complex |
S3L_mat_mult_noadd | C = AB |
real or complex |
S3L_mat_mult_addto | C = D + AB |
real or complex |
S3L_mat_mult_t1 | C = C + ATB |
real or complex |
S3L_mat_mult_t1_noadd | C = ATB |
real or complex |
S3L_mat_mult_t1_addto | C = D + ATB |
real or complex |
S3L_mat_mult_h1 | C = C + AHB |
complex only |
S3L_mat_mult_h1_noadd | C = AHB |
complex only |
S3L_mat_mult_h1_addto | C = D + AHB |
complex only |
S3L_mat_mult_t2 |
C = C + ABT |
real or complex |
S3L_mat_mult_t2_noadd | C = ABT |
real or complex |
S3L_mat_mult_t2_addto | C = D + ABT |
real or complex |
S3L_mat_mult_h2 | C = C + ABH |
complex only |
S3L_mat_mult_h2_noadd | C = ABH |
complex only |
S3L_mat_mult_h2_addto | C = D + ABH |
complex only |
S3L_mat_mult_t1_t2 |
C = C + ATBT |
real or complex |
S3L_mat_mult_t1_t2 |
C = C + ATBT |
real or complex |
S3L_mat_mult_t1_t2_noadd | C = ATBT |
real or complex |
S3L_mat_mult_t1_t2_addto | C = D + ATBT |
real or complex |
The algorithm used depends on the axis lengths of the variables supplied.
For calls that do not transpose either matrix A or B, the variables conform correctly with the axis lengths for row_axis and col_axis shown in Table 8-6.
Table 8-6 Recommended row_axis and col_axis Values When Matrix A and Matrix B Are Not Transposed
Variable |
row_axis Length |
col_axis Length |
---|---|---|
A |
p |
q |
B |
q |
r |
C |
p |
r |
D |
p |
r |
For calls that transpose the matrix A (AT), the variables conform correctly with the axis lengths for row_axis and col_axis shown in Table 8-7.
Table 8-7 Recommended row_axis and col_axis Values When Matrices Are Transposed
Variable |
row_axis Length |
col_axis Length |
---|---|---|
A |
q |
p |
B |
q |
r |
C |
p |
r |
D |
p |
r |
For calls that transpose the matrix B (BT), the variables conform correctly with the axis lengths for row_axis and col_axis shown in Table 8-8.
Table 8-8 Recommended row_axis and col_axis Values When Matrix B Is Transposed
Variable |
row_axis Length |
col_axis Length |
---|---|---|
A |
q |
q |
B |
r |
q |
C |
p |
r |
D |
p |
r |
For calls that transpose both A and B (ATBT), the variables conform correctly with the axis lengths for row_axis and col_axis shown in Table 8-9.
Table 8-9 Recommended row_axis and col_axis Values When Both Matrix A and Matrix B Are Transposed
Variable |
row_axis Length |
col_axis Length |
---|---|---|
A |
q |
p |
B |
r |
q |
C |
p |
r |
D |
p |
r |
The algorithm is numerically stable.
The C and Fortran syntax for S3L_mat_mult are shown below.
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_mat_mult(C, A, B, row_axis, col_axis) S3L_mat_mult_noadd(C, A, B, row_axis, col_axis) S3L_mat_mult_addto(C, A, B, D, row_axis, col_axis) S3L_mat_mult_t1(C, A, B, row_axis, col_axis) S3L_mat_mult_t1_noadd(C, A, B, row_axis, col_axis) S3L_mat_mult_t1_addto(C, A, B, D, row_axis, col_axis) S3L_mat_mult_h1(C, A, B, row_axis, col_axis) S3L_mat_mult_h1_noadd(C, A, B, row_axis, col_axis) S3L_mat_mult_h1_addto(C, A, B, D, row_axis, col_axis) S3L_mat_mult_t2(C, A, B, row_axis, col_axis) S3L_mat_mult_t2_noadd(C, A, B, row_axis, col_axis) S3L_mat_mult_t2_addto(C, A, B, D, row_axis, col_axis) S3L_mat_mult_h2(C, A, B, row_axis, col_axis) S3L_mat_mult_h2_noadd(C, A, B, row_axis, col_axis) S3L_mat_mult_h2_addto(C, A, B, D, row_axis, col_axis) S3L_mat_mult_t1_t2(C, A, B, row_axis, col_axis) S3L_mat_mult_t1_t2_noadd(C, A, B, row_axis, col_axisb) S3L_mat_mult_t1_t2_addto(C, A, B, D, row_axis, col_axis) S3L_array_t C S3L_array_t A S3L_array_t B S3L_array_t D int row_axis int col_axis |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_mat_mult(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_noadd(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_addto(C, A, B, D, row_axis, col_axis, ier) S3L_mat_mult_t1(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_t1_noadd(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_t1_addto(C, A, B, D, row_axis, col_axis, ier) S3L_mat_mult_h1(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_h1_noadd(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_h1_addto(C, A, B, D, row_axis, col_axis, ier) S3L_mat_mult_t2(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_t2_nodto(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_t2_addto(C, A, B, D, row_axis, col_axis, ier) S3L_mat_mult_h2(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_h2_noadd(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_h2_addto(C, A, B, D, row_axis, col_axis, ier) S3L_mat_mult_t1_t2(C, A, B, row_axis, col_axis, ier) S3L_mat_mult_t1_t2_noadd(C, A, B, row_axis, col_axisb, ier) S3L_mat_mult_t1_t2_addto(C, A, B, D, row_axis, col_axis, ier) integer*8 C integer*8 A integer*8 B integer*8 D integer*4 row_axis integer*4 col_axis integer*4 ier |
C - Array handle for an S3L parallel array of rank >= 2. C is the destination array for all matrix multiplication operations (as discussed in the Output section). Some of these operations also use C as an input argument, adding the contents of C to their respective matrix multiplication products. The operations shown in Table 8-5 that include some variation of C + AB belong to this class.
A - Array handle for an S3L parallel array of the same rank as C and B. This array contains one or more instances of the left-hand factor array A, defined by axes row_axis (which counts the rows) and col_axis (which counts the columns). Axis col_axis of A must have the same length as axis row_axis of B. The contents of A are not changed during execution.
B - Array handle for an S3L parallel array of the same rank as C and A. This array contains one or more instances of the right-hand factor array B, defined by axes row_axis (which counts the rows) and col_axis (which counts the columns). The contents of B are not changed during execution.
D - Parallel array of the same shape as C. This argument is used only in the calls whose names end in _addto. It contains one or more instances of the array D that is to be added to the array product, defined by axes row_axis (which counts the rows) and col_axis (which counts the columns). The contents of D are not changed during execution, unless D and C are the same variable.
row_axis - The axis of C, A, and B that counts the rows of the embedded array or arrays. Must be nonnegative and less than the rank of C.
col_axis - The axis of C, A, and B that counts the columns of the embedded array or arrays. Must be nonnegative and less than the rank of C.
Note: The argument can be identical with the argument C in all matrix multiply _addto routines except _t1_t2_addto.
These functions use the following arguments for output:
C - Array handle for an S3L parallel array, which is a destination array for all matrix multiplication operations. Upon successful completion, each array instance within C is overwritten by the result of the array multiplication call.
ier (Fortran only) - When called from a Fortran program, these functions return error status in ier.
On success, the S3L_mat_mult routines return S3L_SUCCESS.
The S3L_mat_mult routines perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause these functions to terminate and return the associated error code:
S3L_ERR_MATCH_RANK - The parallel arrays do not have the same rank.
S3L_ERR_MATCH_EXTENTS - The lengths of corresponding axes do not match.
S3L_ERR_MATCH_DTYPE - The arguments are not all of the same data type and precision.
S3L_ERR_ARG_AXISNUM - row_axis and/or col_axis contains a bad axis number. For C program calls, each of these parameters must be >= 0 and less than the rank of C. For Fortran calls, they must be >= 1 and <= the rank of C.
S3L_ERR_CONJ_INVAL - Conjugation was requested, but data supplied was not of type S3L_complex_t or S3L_dcomplex_t.
../examples/s3l/dense_matrix_ops/matmult.c ../examples/s3l/dense_matrix_ops-f/matmult.f
S3L_inner_prod(3) S3L_2_norm(3) S3L_outer_prod(3) S3L_mat_vec_mult(3)
Sun S3L provides six matrix vector multiplication routines, which compute one or more instances of a matrix vector product. For each instance, these routines perform the operations listed in Table 8-10.
In these descriptions, conj[A] denotes the conjugate of A.
Routine |
Operation |
Data Type |
---|---|---|
S3L_mat_vec_mult |
y = y + Ax |
real or complex |
S3L_mat_vec_mult_noadd |
y = Ax |
real or complex |
S3L_mat_vec_mult_addto |
y = v + Ax |
real or complex |
S3L_mat_vec_mult_c1 |
y = y + conj[A]x |
complex only |
S3L_mat_vec_mult_c1_noadd |
y = conj[A]x |
complex only |
S3L_mat_vec_mult_c1_noadd |
y = v + conj[A]x |
complex only |
The C and Fortran syntax for S3L_mat_vec_mult are shown below.
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_mat_vec_mult(y, A, x, y_vector_axis, row_axis, col_axis, x_vector_axis) S3L_mat_vec_mult_noadd(y, A, x, y_vector_axis, row_axis, col_axis, x_vector_axis) S3L_mat_vec_mult_addto(y, A, x, v, y_vector_axis, row_axis, col_axis, x_vector_axis) S3L_mat_vec_mult_c1(y, A, x, y_vector_axis, row_axis, col_axis, x_vector_axis) S3L_mat_vec_mult_c1_noadd(y, A, x, y_vector_axis, row_axis, col_axis, x_vector_axis) S3L_mat_vec_mult_c1_addto(y, A, x, v, y_vector_axis, row_axis, col_axis, x_vector_axis) S3L_array_t y S3L_array_t A S3L_array_t x S3L_array_t v int y_vector_axis int row_axis int col_axis int x_vector_axis |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_mat_vec_mult(y, A, x, y_vector_axis, row_axis, col_axis, x_vector_axis, ier) S3L_mat_vec_mult_noadd(y, A, x, y_vector_axis, row_axis, col_axis, x_vector_axis, ier) S3L_mat_vec_mult_addto(y, A, x, v, y_vector_axis, row_axis, col_axis, x_vector_axis, ier) S3L_mat_vec_mult_c1(y, A, x, y_vector_axis, row_axis, col_axis, x_vector_axis, ier) S3L_mat_vec_mult_c1_noadd(y, A, x, y_vector_axis, row_axis, col_axis, x_vector_axis, ier) S3L_mat_vec_mult_c1_addto(y, A, x, v, y_vector_axis, row_axis, col_axis, x_vector_axis, ier) integer*8 y integer*8 A integer*8 x integer*8 v integer*4 y_vector_axis integer*4 row_axis integer*4 col_axis integer*4 x_vector_axis integer*4 ier |
y - Array handle for an S3L parallel array of rank >= 1. Two matrix vector multiplication routines, S3L_mat_vec_mult and S3L_mat_vec_mult_c1 add the contents of this array to the product of Ax. All matrix vector multiplication routines use y as the destination array, as described in the Output section.
A - Array handle for an S3L parallel array of rank one greater than that of y. It contains one or more instances of the matrix A, defined by axes row_axis (which counts the rows) and col_axis (which counts the columns).
The remaining axes must match the instance axes of y in length and order of declaration. Thus, each matrix in A corresponds to a vector in y. The contents of A are not changed during execution
x - Array handle for an S3L parallel array of the same rank as y. It contains one or more instances of x, the vector that will be multiplied by the matrix A, embedded along axis x_vector_axis.
Axis x_vector_axis of x must have the same length as axis col_axis of A. The remaining axes of x must match the instance axes of y in length and order of declaration. Thus, each vector in x corresponds to a vector in y. The contents of x are not changed during execution.
v - Array handle for an S3L parallel array of the same rank and shape as y. This argument is used only in the S3L_mat_vec_mult_addto and S3L_mat_vec_mult_c1_addto calls. It contains one or more instances of the vector v, which will be added to the matrix vector product, embedded along axis y_vector_axis. The contents of v are not changed during execution, unless v is the same variable as y.
Note: For S3L_mat_vec_mult_addto and S3L_mat_vec_mult_c1_addto, the argument v can be identical to the argument y.
y_vector_axis - Scalar integer variable that specifies the axis of y and v along which the elements of the embedded vectors lie. For C/C++ programs, this argument must be nonnegative and less than the rank of y. For F77/F90 programs, it must be greater than zero and less than or equal to the rank of y.
row_axis - Scalar integer variable. It counts the rows of the embedded matrix or matrices. For C/C++ programs, this argument must be nonnegative and less than the rank of A. For F77/F90 programs, it must be greater than zero and less than or equal to the rank of A.
col_axis - Scalar integer variable that counts the columns of the embedded matrix or matrices. For C/C++ programs, this argument must be nonnegative and less than the rank of A. For F77/F90 programs, it must be greater than zero and less than or equal to the rank of A.
x_vector_axis - Scalar integer variable that specifies the axis of x along which the elements of the embedded vectors lie. For C/C++ programs, this argument must be nonnegative and less than the rank of y. For F77/F90 programs, it must be greater than zero and less than or equal to the rank of x.
These functions use the following arguments for output:
y - Array handle for an S3L array of rank >= 1. This array contains one or more instances of the destination vector y embedded along the axis y_vector_axis. This axis must have the same length as axis row_axis of A. Upon completion, each vector instance is overwritten by the result of the matrix vector multiplication call.
ier (Fortran only) - When called from a Fortran program, these functions return error status in ier.
On success, the S3L_mat_vec_mult routines return S3L_SUCCESS.
The S3L_mat_vec_mult routines perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause these functions to terminate and return the associated error code:
S3L_ERR_MATCH_RANK - The parallel arrays do not have the same rank.
S3L_ERR_MATCH_EXTENTS - The lengths of corresponding axes do not match.
S3L_ERR_MATCH_DTYPE - The arguments are not all of the same data type and precision.
S3L_ERR_ARG_AXISNUM - row_axis and/or col_axis contains a bad axis number. For C/C++ program calls, each of these parameters must be nonnegative and less than the rank of A. For F77/F90 calls, they must be greater than zero and less than or equal to the rank of A.
S3L_ERR_CONJ_INVAL - Conjugation was requested, but the data supplied was not of type S3L_complex_t or S3L_dcomplex_t.
../examples/s3l/dense_matrix_ops/matvec_mult.c ../examples/s3l/dense_matrix_ops-f/matvec_mult.f
S3L_inner_prod(3) S3L_2_norm(3) S3L_outer_prod(3) S3L_mat_mult(3)
Sun S3L provides six outer product routines which compute one or more instances of an outer product of two vectors. For each instance, the outer product routines perform the operations listed in Table 8-11.
In these descriptions, yT and yH denote y transpose and y Hermitian, respectively
Routine |
Operation |
Data Type |
---|---|---|
S3L_outer_prod |
A = A + xyT |
real or complex |
S3L_outer_prod_noadd |
A = xyT |
real or complex |
S3L_outer_prod_addto | A = B + xyT |
real or complex |
S3L_outer_prod_c2 | A = A + xyH |
complex only |
S3L_outer_prod_c2_noadd | A = xyT |
complex only |
S3L_outer_prod_c2_noadd | A = B + xyT |
complex only |
In elementwise notation, for each instance S3L_outer_prod computes
A(i,j) = A(i,j) + x(i) * y(j) |
and S3L_outer_prod_c2 computes
A(i,j) = A(i,j) + x(i) * conj[y(j)] |
where conj[y(j)] denotes the conjugate of y(j).
The C and Fortran syntax for S3L_outer_prod are shown below.
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_outer_prod(A, x, y, row_axis, col_axis, x_vector_axis, y_vector_axis) S3L_outer_prod_noadd(A, x, y, row_axis, col_axis, x_vector_axis, y_vector_axis) S3L_outer_prod_addto(A, x, y, B, row_axis, col_axis, x_vector_axis, y_vector_axis) S3L_outer_prod_c2(A, x, y, row_axis, col_axis, x_vector_axis, y_vector_axis) S3L_outer_prod_c2_noadd(A, x, y, row_axis, col_axis, x_vector_axis, y_vector_axis) S3L_outer_prod_c2_addto(A, x, y, B, row_axis, col_axis, x_vector_axis, y_vector_axis) S3L_array_t A S3L_array_t x S3L_array_t y S3L_array_t B int row_axis int col_axis int x_vector_axis int y_vector_axis |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_outer_prod(A, x, y, row_axis, col_axis, x_vector_axis, y_vector_axis, ier) S3L_outer_prod_noadd(A, x, y, row_axis, col_axis, x_vector_axis, y_vector_axis, ier) S3L_outer_prod_addto(A, x, y, B, row_axis, col_axis, x_vector_axis, y_vector_axis, ier) S3L_outer_prod_c2(A, x, y, row_axis, col_axis, x_vector_axis, y_vector_axis, ier) S3L_outer_prod_c2_noadd(A, x, y, row_axis, col_axis, x_vector_axis, y_vector_axis, ier) S3L_outer_prod_c2_addto(A, x, y, B, row_axis, col_axis, x_vector_axis, y_vector_axis, ier) S3L_array_t A S3L_array_t x S3L_array_t y S3L_array_t B int row_axis int col_axis int x_vector_axis int y_vector_axis int ier |
A - Array handle for an S3L parallel array of rank greater than or equal to 2. Two S3L outer product routines, S3L_outer_prod and S3L_outer_prod_c2, add the contents of this array to the product of xy. All outer product routines use A as the destination array, as described in the Output section.
x - Array handle for an S3L parallel array of rank one less than that of A. It contains one or more instances of the first source vector, x, embedded along axis x_vector_axis.
Axis x_vector_axis of x must have the same length as axis row_axis of A. The remaining axes of x must match the instance axes of A in length and order of declaration. Thus, each vector in x corresponds to a vector in A.
y - Array handle for an S3L parallel array of rank one less than that of A. It contains one or more instances of the second source vector, x, embedded along axis y_vector_axis.
y_vector_axis must have the same length as axis col_axis of A. The remaining axes of y must match the instance axes of A in length and order of declaration. Thus, each vector in y corresponds to a vector in A.
Note: The argument y can be identical to the argument x.
B - Parallel array of the same shape as A. It contains one or more embedded matrices B defined by axes row_axis (which counts the rows) and col_axis (which counts the columns). The remaining axes must match the instance axes of A in length and order of declaration. Thus, each matrix in B corresponds to a matrix in A.
This argument is used only in the S3L_outer_prod_addto and S3L_outer_prod_c2_addto calls, which add each outer product to the corresponding matrix within B and place the result in the corresponding matrix within A. The contents of B are not changed by the operation (unless B and A are the same variable).
Note: For S3L_outer_prod_addto and S3L_outer_prod_c2_addto, the argument B can be identical to the argument A.
row_axis - Scalar integer variable. The axis of A and B that counts the rows of the embedded matrix or matrices. For C/C++ programs, this argument must be nonnegative and less than the rank of A. For F77/F90 programs, it must be greater than zero and less than or equal to the rank of A.
col_axis - Scalar integer variable. The axis of A and B that counts the columns of the embedded matrix or matrices. For C/C++ programs, this argument must be nonnegative and less than the rank of A. For F77/F90 programs, it must be greater than zero and less than or equal to the rank of A.
x_vector_axis - Scalar integer variable that specifies the axis of x along which the elements of the embedded vectors lie. For C/C++ programs, this argument must be nonnegative and less than the rank of y. For F77/F90 programs, it must be greater than zero and less than or equal to the rank of x.
y_vector_axis - Scalar integer variable that specifies the axis of y and v along which the elements of the embedded vectors lie. For C/C++ programs, this argument must be nonnegative and less than the rank of y. For F77/F90 programs, it must be greater than zero and less than or equal to the rank of y.
These functions use the following arguments for output:
A - Array handle for an S3L parallel array of rank greater than or equal to 2, which contains one or more instances of the destination matrix A, defined by axes row_axis (which counts the rows) and col_axis (which counts the columns). Upon successful completion, each matrix instance is overwritten by the result of the outer product call.
ier (Fortran only) - When called from a Fortran program, these functions return error status in ier.
On success, the S3L_outer_prod routines return S3L_SUCCESS.
The S3L_outer_prod routines perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause these functions to terminate and return the associated error code:
S3L_ERR_MATCH_RANK - The parallel arrays do not have the same rank.
S3L_ERR_MATCH_EXTENTS - The lengths of corresponding axes do not match.
S3L_ERR_MATCH_DTYPE - The arguments are not all of the same data type and precision.
S3L_ERR_ARG_AXISNUM - row_axis and/or col_axis contains a bad axis number. For C/C++ program calls, each of these parameters must be nonnegative and less than the rank of A. For F77/F90 calls, they must be greater than zero and lessthan or equal to the rank of A.
S3L_ERR_CONJ_INVAL - Conjugation was requested, but the data supplied was not of type S3L_complex_t or S3L_dcomplex_t.
S3L_ERR_ARG_RANK - Rank of A is less than 2.
../examples/s3l/dense_matrix_ops/outer_prod.c ../examples/s3l/dense_matrix_ops-f/outer_prod.f
S3L_inner_prod(3) S3L_2_norm(3) S3L_mat_vec_mult(3) S3L_mat_mult(3)