Sun S3L 3.0 Programming and Reference Guide

Chapter 8 Sun S3L Core Library Functions

This chapter describes the set of computational functions, which form the core of the scientific subroutine library. These descriptions are organized as follows:

Dense Matrix Routines
- S3L_2_norm - See "S3L_2_norm and S3L_gbl_2_norm"
- S3L_inner_prod - See "S3L_inner_prod and S3_gbl_inner_prod"
- S3L_mat_mult - See "S3L_mat_mult "
- S3L_mat_vec_mult - See "S3L_mat_vec_mult "
- S3L_outer_prod - See "S3L_outer_prod "

Sparse Matrix Routines
- S3L_declare_sparse - See "S3L_declare_sparse "
- S3L_free_sparse - See "S3L_free_sparse "
- S3L_rand_sparse - See "S3L_rand_sparse "
- S3L_matvec_sparse - See "S3L_matvec_sparse "
- S3L_read_sparse - See "S3L_read_sparse "
  
  S3L_print_sparse - See "S3L_print_sparse "

Gaussian Elimination for Dense Systems
- S3L_lu_factor - See "S3l_lu_factor "
- S3L_lu_invert - See "S3l_lu_invert "
- S3L_lu_solve - See "S3l_lu_solve "
- S3L_lu_deallocate - See "S3l_lu_deallocate "

Fast Fourier Transforms
- S3L_fft - See "S3L_fft "
- S3L_fft_detailed - See "S3L_fft_detailed "
- S3L_ifft - See "S3L_ifft "
- S3L_rc_fft - See "S3L_rc_fft and S3L_cr_fft "
- S3L_cr_fft - See "S3L_rc_fft and S3L_cr_fft "
- S3L_fft_setup - See "S3L_fft_setup "
- S3L_rc_fft_setup - See "S3L_rc_fft_setup "
- S3L_fft_free_setup - See "S3L_fft_free_setup "
- S3L_rc_fft_free_setup - See "S3L_rc_fft_free_setup "

Structured Solvers
- S3L_gen_band_factor - See "S3L_gen_band_factor "
- S3L_gen_band_free_factors - See "S3L_gen_band_free_factors "
- S3L_gen_band_solve - See "S3L_gen_band_solve "
- S3L_gen_trid_factor - See "S3L_gen_trid_factor "
- S3L_gen_trid_free_factors - See "S3L_gen_trid_free_factors "
- S3L_gen_trid_solve - See "S3L_gen_trid_solve "

Dense Symmetric Eigenvalue Solve
- S3L_sym_eigen - See "S3L_sym_eigen "

Parallel Random Number Generators
- S3L_setup_rand_fib - See "S3L_setup_rand_fib "
- S3L_free_rand_fib - See "S3L_free_rand_fib"
- S3L_rand_fib - See "S3L_rand_fib "
- S3L_rand_lcg - See "S3L_rand_lcg "

Least Squares Solver
- S3L_gen_lsq - See "S3L_gen_lsq "

Dense Singular Value Decomposition
- S3L_gen_svd - See "S3L_gen_svd"

Iterative Solver
- S3L_gen_iter_solve - See "S3L_gen_iter_solve "

Auto-correlation
- S3L_acorr_setup - See "S3L_acorr_setup "
- S3L_acorr_free_setup - See "S3L_acorr_free_setup "
- S3L_acorr - See "S3L_acorr"

Convolution/Deconvolution
- S3L_conv_setup - See "S3L_conv_setup "
- S3L_conv_free_setup - see "S3L_deconv_free_setup "
- S3L_conv - See "S3L_conv "
- S3L_deconv_setup - See "S3L_deconv_setup "
- S3L_deconv_free_setup - See "S3L_deconv_free_setup "
- S3L_deconv - See "S3L_deconv "

Multidimensional Sort and Grade
- S3L_grade_up - See "S3L_grade_down, S3L_grade_up, S3L_grade_down_detailed, S3L_grade_up_detailed "
- S3L_grade_down - See "S3L_grade_down, S3L_grade_up, S3L_grade_down_detailed, S3L_grade_up_detailed "
- S3L_grade_detailed_up - See "S3L_grade_down, S3L_grade_up, S3L_grade_down_detailed, S3L_grade_up_detailed "
- S3L_grade_detailed_down - See "S3L_grade_down, S3L_grade_up, S3L_grade_down_detailed, S3L_grade_up_detailed "
- S3L_sort - See "S3L_sort, S3L_sort_up, S3L_sort_down, S3L_sort_detailed_up, S3L_sort_detailed_down "
- S3L_sort_up - See "S3L_sort, S3L_sort_up, S3L_sort_down, S3L_sort_detailed_up, S3L_sort_detailed_down "
- S3L_sort_down - See "S3L_sort, S3L_sort_up, S3L_sort_down, S3L_sort_detailed_up, S3L_sort_detailed_down "
- S3L_sort_detailed_up - See "S3L_sort, S3L_sort_up, S3L_sort_down, S3L_sort_detailed_up, S3L_sort_detailed_down "
- S3L_sort_detailed_down - See "S3L_sort, S3L_sort_up, S3L_sort_down, S3L_sort_detailed_up, S3L_sort_detailed_down "

Parallel Transpose
- S3L_trans - See "S3L_trans "

Dense Matrix Routines

`S3L_2_norm` and `S3L_gbl_2_norm`

Description

Multiple-Instance 2-norm - The multiple-instance 2-norm routine, S3L_2_norm, computes one or more instances of the 2-norm of a vector. The single-instance 2-norm routine, S3L_gbl_2_norm, computes the global 2-norm of a parallel array.

For each instance z of z, the multiple-instance routine S3L_2_norm performs the operation shown in Table 8-1.

Table 8-1 S3L Multiple-Instance 2-norm Operations


Operation	Data Type
z = (x^Tx)^1/2 = \|\|x\|\|(2)	real
z = (x^Hx)^1/2 = \|\|x\|\|(2)	complex

Upon successful completion, S3L_2_norm overwrites each element of z with the 2-norm of the corresponding vector in x.

Single-Instance 2-norm - The single-instance routine S3L_gbl_2_norm routine performs the operations shown in Table 8-2.

Table 8-2 S3L Single-Instance 2-norm Operations


Operation	Data Type
a = (x^Tx)^1/2 = \|\|x\|\|(2)	real
a = (x^Hx)^1/2 = \|\|x\|\|(2)	complex

Upon successful completion, S3L_gbl_2_norm overwrites a with the global 2-norm of x.

Syntax

The C and Fortran syntax for S3L_2_norm and S3L_gbl_2_norm are shown below.

C/C++ Syntax

Example 8-1

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_2_norm(z, x, x_vector_axis)
S3L_gbl_2_norm(a, x)
    S3L_array_t        a
    S3L_array_t        z
    S3L_array_t        x
    int                x_vector_axis

F77/F90 Syntax

Example 8-2

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_2_norm(z, x, ier)
S3L_gbl_2_norm(a, x, ier)
    integer*8          a
    integer*8          z
    integer*8          x
    integer*4          x_vector_axis
    integer*4          ier

Input

x - Array handle for an S3L parallel array. For calls to S3L_2_norm (multiple-instance routine), x must represent a parallel array of rank >= 2, with at least one nonlocal instance axis. The variable x contains one or more instances of the vector x whose 2-norm will be computed.
For calls to S3L_gbl_2_norm (single-instance routine), x must represent a parallel array of rank >= 1, with any instance axes declared to have length 1.
x_vector_axis - Scalar variable. Identifies the axis of x along which the vectors lie.

Output

These functions use the following argument for output:

z - Array handle for the S3L parallel array that will contain the results of the multiple-instance 2-norm routine. The rank of z must be one less than that of x. The axes of z must match the instance axes of x in length and order of declaration. Thus, each vector x in x corresponds to a single destination value z in z.
a - Pointer to a scalar variable. Destination for the single-instance 2-norm routine.
ier (Fortran only) - When called from a Fortran program, these functions return error status in ier.

Error Handling

On success, S3L_2_norm and S3L_gbl_2_norm return S3L_SUCCESS.

S3L_2_norm and S3L_gbl_2_norm perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions will cause the functions to terminate and return the associated error code.

S3L_ERR_ARG_RANK - x has invalid rank.

S3L_ERR_ARG_AXISNUM - (S3L_2_norm only) x_vector_axis is a bad axis number. For C program calls, this parameter must be >= 0 and less than the rank of x. For Fortran program calls, it must be >= 1 and not exceed the rank of x.

S3L_ERR_MATCH_RANK - z does not have a rank of one less than that of x.

Examples

../examples/s3l/dense_matrix_ops/norm2.c
../examples/s3l/dense_matrix_ops-f/norm2.f

Related Functions

S3L_inner_prod(3)
S3L_outer_prod(3)
S3L_mat_vec_mult(3)
S3L_mat_mult(3)

`S3L_inner_prod` and `S3_gbl_inner_prod`

Description

Multiple-Instance Inner Product - Sun S3L provides six multiple-instance inner product routines, all of which compute one or more instances of the inner product of two vectors embedded in two parallel arrays. The operations performed by the multiple-instance inner product routines are shown in Table 8-3.

Table 8-3 S3L Multiple-Instance Inner Product Operations


Routine	Operation	Data Type
`S3L_inner_prod`	`z = z + x^Ty`	real or complex
`S3L_inner_prod_noadd`	`z = x^Ty`	real or complex
`S3L_inner_prod_addto`	`z = u + x^Ty`	real or complex
`S3L_inner_prod_c1`	`z = z + x^Hy`	complex only
`S3L_inner_prod_c1_noadd`	`z = x^Hy`	complex only
`S3L_inner_prod_c1_addto`	`z = u + x^Hy`	complex only

For these multiple-instance operations, array x contains one or more instances of the first vector in each inner-product pair, x. Likewise, array y contains one or more instances of the second vector in each pair, y.

Note -

The array arguments x, y, and so forth. actually represent array handles that describe S3L parallel arrays. For convenience, however, this discussion ignores that distinction and refers to them as if they were the arrays themselves.

x and y must be at least rank 1 arrays, must be of the same rank, and their corresponding axes must have the same extents. Additionally, x and y must both be distributed arrays--that is, each must have at least one axis that is nonlocal.

Array z, which stores the results of the multiple-instance inner product operations, must be of rank one less than that of x and y. Its axes must match the instance axes of x and y in length and order of declaration and it must also have at least one axis that is nonlocal. This means each vector pair in x and y corresponds to a single destination value in z.

For S3L_inner_prod and S3L_inner_prod_c1, z is also used as the source for a set of values, which are added to the inner products of the corresponding x and y vector pairs.

Finally, x, y, and z must match in data type and precision.

Two scalar integer variables, x_vector_axis and y_vector_axis, specify the axes of x and y along which the constituent vectors in each vector pair lie.

Note -

When specifying values for x_vector_axis and y_vector_axis, keep in mind that Sun S3L functions employ zero-based array indexing when they are called via the C/C++ interface and one-based indexing when called via the F77/F90 interface.

The array handle u describes an S3L parallel array that is used by S3L_inner_prod_addto and S3L_inner_prod_c1_addto. These routines add the values contained in u to the inner products of the corresponding x and y vector pairs.

Upon successful completion of S3L_inner_prod or S3L_inner_prod_c1, the inner product of each vector pair x and y in x and y, respectively, is added to the corresponding value in z.

Upon successful completion of S3L_inner_prod_noadd or S3L_inner_prod_c1_noadd, the inner product of each vector pair x and y in x and y, respectively, overwrites the corresponding value in z.

Upon successful completion of S3L_inner_prod_addto or S3L_inner_prod_c1_addto, the inner product of each vector pair x and y in x and y respectively, is added to the corresponding value in u, and each resulting sum overwrites the corresponding value in z.

Note -

If the instance axes of x and y--that is, the axes along which the inner product will be taken--each contains only a single vector, either declare the axes to have an extent of 1 or use the comparable single-instance inner product routine, as described below.

Single-Instance Inner Product - Sun S3L also provides six single-instance inner product routines, all of which compute the inner product over all the axes of two parallel arrays. The operations performed by the single-instance inner product routines are shown in Table 8-4.

Table 8-4 S3L Single-Instance Inner Product Operations


Routine	Operation	Data Type
`S3L_gbl_inner_prod`	`a = a + x^Ty`	real or complex
`S3L_gbl_inner_prod_noadd`	`a = x^Ty`	real or complex
`S3L_gbl_inner_prod_addto`	`^{a = b + xTy}`	real or complex
`S3L_gbl_inner_prod_c1`	`a = a + x^Hy`	complex only
`S3L_gbl_inner_prod_c1_noadd`	`a = x^Hy`	complex only
`S3L_gbl_inner_prod_c1_addto`	`a = b + x^Hy`	complex only

For these single-instance functions, x and y are S3L parallel arrays of rank 1 or greater and with the same data type and precision.

a is a pointer to a scalar variable of the same data type as x and y. This variable stores the results of the single-instance inner product operations.

For S3L_gbl_inner_prod and S3L_gbl_inner_prod_c1, a is also used as the source for a set of values, which are added to the inner product of x and y.

b is also a pointer to a scalar variable of the same data type as x and y. It contains a set of values that S3L_gbl_inner_prod_addto and S3L_gbl_inner_prod_c1_addto add to the inner product of x and y.

Upon successful completion of S3L_gbl_inner_prod or S3L_gbl_inner_prod_c1, the global inner product of x and y is added to a.

Upon successful completion of S3L_gbl_inner_prod_noadd or S3L_gbl_inner_prod_c1_noadd, the global inner product of x and y overwrites a.

Upon successful completion of S3L_gbl_inner_prod_addto or S3L_gbl_inner_prod_c1_addto, the global inner product of x and y is added to b, and the resulting sum overwrites a.

Note -

Array variables must not overlap.

Syntax

The C and Fortran syntax for S3L_inner_prod and S3L_gbl_inner_prod are shown below.

C/C++ Syntax

Example 8-3

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_inner_prod(z, x, y, x_vector_axis, y_vector_axis)
S3L_inner_prod_noadd(z, x, y, x_vector_axis, y_vector_axis)
S3L_inner_prod_addto(z, x, y, *u, x_vector_axis, y_vector_axis)
S3L_inner_prod_c1(z, x, y, x_vector_axis, y_vector_axis)
S3L_inner_prod_c1_noadd(z, x, y, x_vector_axis, y_vector_axis)
S3L_inner_prod_c1_addto(z, x, y, *u, x_vector_axis, y_vector_axis)
S3L_gbl_inner_prod(a, x, y)
S3L_gbl_inner_prod_noadd(a, x, y)
S3L_gbl_inner_prod_addto(a, x, y, b)
S3L_gbl_inner_prod_c1(a, x, y)
S3L_gbl_inner_prod_c1_noadd(a, x, y)
S3L_gbl_inner_prod_c1_addto(a, x, y, b)
    S3L_array_t        z
    S3L_array_t        x
    S3L_array_t        y
    S3L_array_t        u
    S3L_array_t        a
    S3L_array_t        b
    int                x_vector_axis
    int                y_vector_axis

F77/F90 Syntax

Example 8-4

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_inner_prod(z, x, y, x_vector_axis, y_vector_axis, ier)
S3L_inner_prod_noadd(z, x, y, x_vector_axis, y_vector_axis, ier)
S3L_inner_prod_addto(z, x, y, *u, x_vector_axis, y_vector_axis, ier)
S3L_inner_prod_c1(z, x, y, x_vector_axis, y_vector_axis, ier)
S3L_inner_prod_c1_noadd(z, x, y, x_vector_axis, y_vector_axis, ier)
S3L_inner_prod_c1_addto(z, x, y, *u, x_vector_axis, y_vector_axis, ier)
S3L_gbl_inner_prod(a, x, y, ier)
S3L_gbl_inner_prod_noadd(a, x, y)
S3L_gbl_inner_prod_addto(a, x, y, b)
S3L_gbl_inner_prod_c1(a, x, y)
S3L_gbl_inner_prod_c1_noadd(a, x, y)
S3L_gbl_inner_prod_c1_addto(a, x, y, b)
    S3L_array_t        z
    S3L_array_t        x
    S3L_array_t        y
    S3L_array_t        u
    S3L_array_t        a
    S3L_array_t        b
    int                x_vector_axis
    int                y_vector_axis
    int                ier

Input

z - Array handle for an S3L parallel array, which S3L_inner_prod and S3L_inner_prod_c1 use as a source of values to be added to the inner products of the corresponding x and y vector pairs. z is also used for output; see the Output section for details.
x - Array handle for an S3L parallel array that contains the first vector in each vector pair for which an inner product will be computed.
y - Array handle for an S3L parallel array that contains the second vector in each vector pair for which an inner product will be computed.
u - Array handle for an S3L parallel array whose rank is one less than that of x and y. S3L_inner_prod_addto and S3L_inner_prod_c1_addto add the contents of u to the inner products of the corresponding vector pairs of x and y.
a - Pointer to a scalar variable, which S3L_gbl_inner_prod and S3L_gbl_inner_prod_c1 use as source of values to be added to the inner product of x and y. a is also used for output; see the Output section for details.
b - Pointer to a scalar variable, which S3L_gbl_inner_prod_addto and S3L_gbl_inner_prod_c1_addto use as source of values to be added to the inner product of x and y.
x_vector_axis - Scalar variable. Identifies the axis of x along which the vectors lie.
y_vector_axis - Scalar variable. Identifies the axis of y along which the vectors lie.

Output

These functions use the following arguments for output:

z - Array handle for the S3L parallel array that will contain the results of the multiple-instance 2-norm routine.
a - Pointer to a scalar variable, which is the destination for the single-instance inner product routines.
ier (Fortran only) - When called from a Fortran program, these functions return error status in ier.

Error Handling

On success, S3L_inner_prod and S3L_gbl_inner_prod return S3L_SUCCESS.

S3L_inner_prod and S3L_gbl_inner_prod perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions will cause the function to terminate and return the associated error code:

S3L_ERR_MATCH_RANK - x and y do not have the same rank.

S3L_ERR_MATCH_EXTENTS - Axes of x and y do not have the same extents.

S3L_ERR_MATCH_DTYPE - The arguments are not all of the same data type and precision.

S3L_ERR_CONJ_INVAL - Conjugation was requested, but data supplied was not of type S3L_complex_t or S3L_dcomplex_t.

Examples

../examples/s3l/dense_matrix_ops/inner_prod.c
../examples/s3l/dense_matrix_ops-f/inner_prod.f

Related Functions

S3L_2_norm(3)
S3L_outer_prod(3)
S3L_mat_vec_mult(3)
S3L_mat_mult(3)

`S3L_mat_mult`

Description

Sun S3L provides 18 matrix multiplication routines that compute one or more instances of matrix products. For each instance, these routines perform the operations listed in Table 8-5.

Note -

In these descriptions, A^T and A^H denote A transpose and A Hermitian, respectively.

Table 8-5 S3L Matrix Multiplication Operations


Routine	Operation	Data Type
`S3L_mat_mult`	`C = C + AB`	real or complex
`S3L_mat_mult_noadd`	`C = AB`	real or complex
`S3L_mat_mult_addto`	`C = D + AB`	real or complex
`S3L_mat_mult_t1`	`C = C + A^TB`	real or complex
`S3L_mat_mult_t1_noadd`	`C = A^TB`	real or complex
`S3L_mat_mult_t1_addto`	`C = D + A^TB`	real or complex
`S3L_mat_mult_h1`	`C = C + A^HB`	complex only
`S3L_mat_mult_h1_noadd`	`C = A^HB`	complex only
`S3L_mat_mult_h1_addto`	`C = D + A^HB`	complex only
`S3L_mat_mult_t2`	`C = C + AB`T	real or complex
`S3L_mat_mult_t2_noadd`	`C = AB^T`	real or complex
`S3L_mat_mult_t2_addto`	`C = D + ABT`	real or complex
`S3L_mat_mult_h2`	`C = C + AB^H`	complex only
`S3L_mat_mult_h2_noadd`	`C = ABH`	complex only
`S3L_mat_mult_h2_addto`	`C = D + ABH`	complex only
`S3L_mat_mult_t1_t2`	C = C + A^TB^T	real or complex
`S3L_mat_mult_t1_t2`	C = C + A^TB^T	real or complex
`S3L_mat_mult_t1_t2_noadd`	`C = ATB^T`	real or complex
`S3L_mat_mult_t1_t2_addto`	`C = D + ATBT`	real or complex

The algorithm used depends on the axis lengths of the variables supplied.

For calls that do not transpose either matrix A or B, the variables conform correctly with the axis lengths for row_axis and col_axis shown in Table 8-6.

Table 8-6 Recommended row_axis and col_axis Values When Matrix Aand Matrix B Are Not Transposed


Variable	`row_axis` Length	`col_axis` Length
A	p	q
B	q	r
C	p	r
D	p	r

For calls that transpose the matrix A (A^T), the variables conform correctly with the axis lengths for row_axis and col_axis shown in Table 8-7.

Table 8-7 Recommended row_axis and col_axis Values When Matrices Are Transposed


Variable	`row_axis` Length	`col_axis` Length
A	q	p
B	q	r
C	p	r
D	p	r

For calls that transpose the matrix B (B^T), the variables conform correctly with the axis lengths for row_axis and col_axis shown in Table 8-8.

Table 8-8 Recommended row_axis and col_axis Values When Matrix B Is Transposed


Variable	`row_axis` Length	`col_axis` Length
A	q	q
B	r	q
C	p	r
D	p	r

For calls that transpose both A and B (A^TB^T), the variables conform correctly with the axis lengths for row_axis and col_axis shown in Table 8-9.

Table 8-9 Recommended row_axis and col_axis Values When Both Matrix A and Matrix B Are Transposed


Variable	`row_axis` Length	`col_axis` Length
`A`	q	p
`B`	r	q
`C`	p	r
`D`	p	r

The algorithm is numerically stable.

Syntax

The C and Fortran syntax for S3L_mat_mult are shown below.

C/C++ Syntax

Example 8-5

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_mat_mult(C, A, B, row_axis, col_axis)
S3L_mat_mult_noadd(C, A, B, row_axis, col_axis)
S3L_mat_mult_addto(C, A, B, D, row_axis, col_axis)
S3L_mat_mult_t1(C, A, B, row_axis, col_axis)
S3L_mat_mult_t1_noadd(C, A, B, row_axis, col_axis)
S3L_mat_mult_t1_addto(C, A, B, D, row_axis, col_axis)
S3L_mat_mult_h1(C, A, B, row_axis, col_axis)
S3L_mat_mult_h1_noadd(C,
A, B, row_axis, col_axis)
S3L_mat_mult_h1_addto(C, A, B, D, row_axis, col_axis)
S3L_mat_mult_t2(C, A, B, row_axis, col_axis)
S3L_mat_mult_t2_noadd(C, A, B, row_axis, col_axis)
S3L_mat_mult_t2_addto(C, A, B, D, row_axis, col_axis)
S3L_mat_mult_h2(C, A, B, row_axis, col_axis)
S3L_mat_mult_h2_noadd(C, A, B, row_axis, col_axis)
S3L_mat_mult_h2_addto(C, A, B, D, row_axis, col_axis)
S3L_mat_mult_t1_t2(C, A, B, row_axis, col_axis)
S3L_mat_mult_t1_t2_noadd(C,
A, B, row_axis, col_axisb)
S3L_mat_mult_t1_t2_addto(C, A, B, D, row_axis, col_axis)
    S3L_array_t        C
    S3L_array_t        A
    S3L_array_t        B
    S3L_array_t        D
    int                row_axis
    int                col_axis

F77/F90 Syntax

Example 8-6

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_mat_mult(C, A, B, row_axis, col_axis, ier)
S3L_mat_mult_noadd(C, A,
B, row_axis, col_axis, ier)
S3L_mat_mult_addto(C, A, B, D, row_axis, col_axis, ier)
S3L_mat_mult_t1(C, A, B, row_axis, col_axis, ier)
S3L_mat_mult_t1_noadd(C, A, B, row_axis, col_axis, ier)
S3L_mat_mult_t1_addto(C, A, B, D, row_axis, col_axis, ier)
S3L_mat_mult_h1(C, A, B, row_axis, col_axis, ier)
S3L_mat_mult_h1_noadd(C, A, B, row_axis, col_axis, ier)
S3L_mat_mult_h1_addto(C, A, B, D, row_axis, col_axis, ier)
S3L_mat_mult_t2(C, A, B, row_axis, col_axis, ier)
S3L_mat_mult_t2_nodto(C, A, B, row_axis, col_axis, ier)
S3L_mat_mult_t2_addto(C, A, B, D, row_axis, col_axis, ier)
S3L_mat_mult_h2(C, A, B, row_axis, col_axis, ier)
S3L_mat_mult_h2_noadd(C, A, B, row_axis, col_axis, ier)
S3L_mat_mult_h2_addto(C, A, B, D, row_axis, col_axis, ier)
S3L_mat_mult_t1_t2(C, A,
B, row_axis, col_axis, ier)
S3L_mat_mult_t1_t2_noadd(C, A, B, row_axis, col_axisb, ier)
S3L_mat_mult_t1_t2_addto(C, A, B, D, row_axis, col_axis, ier)
    integer*8          C
    integer*8          A
    integer*8          B
    integer*8          D
    integer*4          row_axis
    integer*4          col_axis
    integer*4          ier

Input

C - Array handle for an S3L parallel array of rank >= 2. C is the destination array for all matrix multiplication operations (as discussed in the Output section). Some of these operations also use C as an input argument, adding the contents of C to their respective matrix multiplication products. The operations shown in Table 8-5 that include some variation of C + AB belong to this class.
A - Array handle for an S3L parallel array of the same rank as C and B. This array contains one or more instances of the left-hand factor array A, defined by axes row_axis (which counts the rows) and col_axis (which counts the columns). Axis col_axis of A must have the same length as axis row_axis of B. The contents of A are not changed during execution.
B - Array handle for an S3L parallel array of the same rank as C and A. This array contains one or more instances of the right-hand factor array B, defined by axes row_axis (which counts the rows) and col_axis (which counts the columns). The contents of B are not changed during execution.
D - Parallel array of the same shape as C. This argument is used only in the calls whose names end in _addto. It contains one or more instances of the array D that is to be added to the array product, defined by axes row_axis (which counts the rows) and col_axis (which counts the columns). The contents of D are not changed during execution, unless D and C are the same variable.
row_axis - The axis of C, A, and B that counts the rows of the embedded array or arrays. Must be nonnegative and less than the rank of C.
col_axis - The axis of C, A, and B that counts the columns of the embedded array or arrays. Must be nonnegative and less than the rank of C.

Note: The argument can be identical with the argument C in all matrix multiply _addto routines except _t1_t2_addto.

Output

These functions use the following arguments for output:

C - Array handle for an S3L parallel array, which is a destination array for all matrix multiplication operations. Upon successful completion, each array instance within C is overwritten by the result of the array multiplication call.

ier (Fortran only) - When called from a Fortran program, these functions return error status in ier.

Error Handling

On success, the S3L_mat_mult routines return S3L_SUCCESS.

The S3L_mat_mult routines perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions will cause these functions to terminate and return the associated error code:

S3L_ERR_MATCH_RANK - The parallel arrays do not have the same rank.

S3L_ERR_MATCH_EXTENTS - The lengths of corresponding axes do not match.

S3L_ERR_MATCH_DTYPE - The arguments are not all of the same data type and precision.

S3L_ERR_ARG_AXISNUM - row_axis and/or col_axis contains a bad axis number. For C program calls, each of these parameters must be >= 0 and less than the rank of C. For Fortran calls, they must be >= 1 and <= the rank of C.

S3L_ERR_CONJ_INVAL - Conjugation was requested, but data supplied was not of type S3L_complex_t or S3L_dcomplex_t.

Examples

../examples/s3l/dense_matrix_ops/matmult.c
../examples/s3l/dense_matrix_ops-f/matmult.f

Related Functions

S3L_inner_prod(3)
S3L_2_norm(3)
S3L_outer_prod(3)
S3L_mat_vec_mult(3)

`S3L_mat_vec_mult`

Description

Sun S3L provides six matrix vector multiplication routines, which compute one or more instances of a matrix vector product. For each instance, these routines perform the operations listed in Table 8-10.

Note -

In these descriptions, conj[A] denotes the conjugate of A.

Table 8-10 S3L Matrix Vector Multiplication Operations


Routine	Operation	Data Type
`S3L_mat_vec_mult`	y = y + Ax	real or complex
`S3L_mat_vec_mult_noadd`	y = Ax	real or complex
`S3L_mat_vec_mult_addto`	y = v + Ax	real or complex
`S3L_mat_vec_mult_c1`	y = y + conj[A]x	complex only
`S3L_mat_vec_mult_c1_noadd`	y = conj[A]x	complex only
`S3L_mat_vec_mult_c1_noadd`	y = v + conj[A]x	complex only

Syntax

The C and Fortran syntax for S3L_mat_vec_mult are shown below.

C/C++ Syntax

Example 8-7

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_mat_vec_mult(y, A, x, y_vector_axis, row_axis, col_axis, x_vector_axis)
S3L_mat_vec_mult_noadd(y, A, x, y_vector_axis, row_axis, col_axis, x_vector_axis)
S3L_mat_vec_mult_addto(y, A, x, v, y_vector_axis, row_axis, col_axis,
x_vector_axis)
S3L_mat_vec_mult_c1(y, A, x, y_vector_axis, row_axis, col_axis, x_vector_axis)
S3L_mat_vec_mult_c1_noadd(y, A, x, y_vector_axis, row_axis, col_axis,
x_vector_axis)
S3L_mat_vec_mult_c1_addto(y, A, x, v, y_vector_axis, row_axis, col_axis,
x_vector_axis)
    S3L_array_t        y
    S3L_array_t        A
    S3L_array_t        x
    S3L_array_t        v
    int                y_vector_axis
    int                row_axis
    int                col_axis
    int                x_vector_axis

F77/F90 Syntax

Example 8-8

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_mat_vec_mult(y, A, x, y_vector_axis, row_axis, col_axis, x_vector_axis,
ier)
S3L_mat_vec_mult_noadd(y, A, x, y_vector_axis, row_axis, col_axis, x_vector_axis,
ier)
S3L_mat_vec_mult_addto(y, A, x, v, y_vector_axis, row_axis, col_axis,
x_vector_axis, ier)
S3L_mat_vec_mult_c1(y, A, x, y_vector_axis, row_axis, col_axis, x_vector_axis,
ier)
S3L_mat_vec_mult_c1_noadd(y, A, x, y_vector_axis, row_axis, col_axis,
x_vector_axis, ier)
S3L_mat_vec_mult_c1_addto(y, A, x, v, y_vector_axis, row_axis, col_axis,
x_vector_axis, ier)
    integer*8          y
    integer*8          A
    integer*8          x
    integer*8          v
    integer*4          y_vector_axis
    integer*4          row_axis
    integer*4          col_axis
    integer*4          x_vector_axis
    integer*4          ier

Input

y - Array handle for an S3L parallel array of rank >= 1. Two matrix vector multiplication routines, S3L_mat_vec_mult and S3L_mat_vec_mult_c1 add the contents of this array to the product of Ax. All matrix vector multiplication routines use y as the destination array, as described in the Output section.
A - Array handle for an S3L parallel array of rank one greater than that of y. It contains one or more instances of the matrix A, defined by axes row_axis (which counts the rows) and col_axis (which counts the columns).

The remaining axes must match the instance axes of y in length and order of declaration. Thus, each matrix in A corresponds to a vector in y. The contents of A are not changed during execution
x - Array handle for an S3L parallel array of the same rank as y. It contains one or more instances of x, the vector that will be multiplied by the matrix A, embedded along axis x_vector_axis.

Axis x_vector_axis of x must have the same length as axis col_axis of A. The remaining axes of x must match the instance axes of y in length and order of declaration. Thus, each vector in x corresponds to a vector in y. The contents of x are not changed during execution.
v - Array handle for an S3L parallel array of the same rank and shape as y. This argument is used only in the S3L_mat_vec_mult_addto and S3L_mat_vec_mult_c1_addto calls. It contains one or more instances of the vector v, which will be added to the matrix vector product, embedded along axis y_vector_axis. The contents of v are not changed during execution, unless v is the same variable as y.

Note: For S3L_mat_vec_mult_addto and S3L_mat_vec_mult_c1_addto, the argument v can be identical to the argument y.
y_vector_axis - Scalar integer variable that specifies the axis of y and v along which the elements of the embedded vectors lie. For C/C++ programs, this argument must be nonnegative and less than the rank of y. For F77/F90 programs, it must be greater than zero and less than or equal to the rank of y.
row_axis - Scalar integer variable. It counts the rows of the embedded matrix or matrices. For C/C++ programs, this argument must be nonnegative and less than the rank of A. For F77/F90 programs, it must be greater than zero and less than or equal to the rank of A.
col_axis - Scalar integer variable that counts the columns of the embedded matrix or matrices. For C/C++ programs, this argument must be nonnegative and less than the rank of A. For F77/F90 programs, it must be greater than zero and less than or equal to the rank of A.
x_vector_axis - Scalar integer variable that specifies the axis of x along which the elements of the embedded vectors lie. For C/C++ programs, this argument must be nonnegative and less than the rank of y. For F77/F90 programs, it must be greater than zero and less than or equal to the rank of x.

Output

These functions use the following arguments for output:

y - Array handle for an S3L array of rank >= 1. This array contains one or more instances of the destination vector y embedded along the axis y_vector_axis. This axis must have the same length as axis row_axis of A. Upon completion, each vector instance is overwritten by the result of the matrix vector multiplication call.

ier (Fortran only) - When called from a Fortran program, these functions return error status in ier.

Error Handling

On success, the S3L_mat_vec_mult routines return S3L_SUCCESS.

The S3L_mat_vec_mult routines perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions will cause these functions to terminate and return the associated error code:

S3L_ERR_MATCH_RANK - The parallel arrays do not have the same rank.

S3L_ERR_MATCH_EXTENTS - The lengths of corresponding axes do not match.

S3L_ERR_MATCH_DTYPE - The arguments are not all of the same data type and precision.

S3L_ERR_ARG_AXISNUM - row_axis and/or col_axis contains a bad axis number. For C/C++ program calls, each of these parameters must be nonnegative and less than the rank of A. For F77/F90 calls, they must be greater than zero and less than or equal to the rank of A.

S3L_ERR_CONJ_INVAL - Conjugation was requested, but the data supplied was not of type S3L_complex_t or S3L_dcomplex_t.

Examples

../examples/s3l/dense_matrix_ops/matvec_mult.c
../examples/s3l/dense_matrix_ops-f/matvec_mult.f

Related Functions

S3L_inner_prod(3)
S3L_2_norm(3)
S3L_outer_prod(3)
S3L_mat_mult(3)

`S3L_outer_prod`

Description

Sun S3L provides six outer product routines which compute one or more instances of an outer product of two vectors. For each instance, the outer product routines perform the operations listed in Table 8-11.

Note -

In these descriptions, y^T and y^H denote y transpose and y Hermitian, respectively

Table 8-11 S3L Outer Product Operations


Routine	Operation	Data Type
`S3L_outer_prod`	`A = A + xy^T`	real or complex
`S3L_outer_prod_noadd`	`A = xyT`	real or complex
`S3L_outer_prod_addto`	`A = B + xyT`	real or complex
`S3L_outer_prod_c2`	`A = A + xy^H`	complex only
`S3L_outer_prod_c2_noadd`	`A = xyT`	complex only
`S3L_outer_prod_c2_noadd`	`A = B + xyT`	complex only

In elementwise notation, for each instance S3L_outer_prod computes

A(i,j) = A(i,j) + x(i) * y(j)

and S3L_outer_prod_c2 computes

A(i,j) = A(i,j) + x(i) * conj[y(j)]

where conj[y(j)] denotes the conjugate of y(j).

Syntax

The C and Fortran syntax for S3L_outer_prod are shown below.

C/C++ Syntax

Example 8-9

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_outer_prod(A, x, y, row_axis, col_axis, x_vector_axis, y_vector_axis)
S3L_outer_prod_noadd(A, x, y, row_axis, col_axis, x_vector_axis, y_vector_axis)
S3L_outer_prod_addto(A, x, y, B, row_axis, col_axis, x_vector_axis,
y_vector_axis)
S3L_outer_prod_c2(A, x, y, row_axis, col_axis, x_vector_axis, y_vector_axis)
S3L_outer_prod_c2_noadd(A, x, y, row_axis, col_axis, x_vector_axis,
y_vector_axis)
S3L_outer_prod_c2_addto(A, x, y, B, row_axis, col_axis, x_vector_axis,
y_vector_axis)
    S3L_array_t        A
    S3L_array_t        x
    S3L_array_t        y
    S3L_array_t        B
    int                row_axis
    int                col_axis
    int                x_vector_axis
    int                y_vector_axis

F77/F90 Syntax

Example 8-10

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_outer_prod(A, x, y, row_axis, col_axis, x_vector_axis, y_vector_axis,
ier)
S3L_outer_prod_noadd(A, x, y, row_axis, col_axis, x_vector_axis, y_vector_axis,
ier)
S3L_outer_prod_addto(A, x, y, B, row_axis, col_axis, x_vector_axis,
y_vector_axis, ier)
S3L_outer_prod_c2(A, x, y, row_axis, col_axis, x_vector_axis, y_vector_axis,
ier)
S3L_outer_prod_c2_noadd(A, x, y, row_axis, col_axis, x_vector_axis,
y_vector_axis, ier)
S3L_outer_prod_c2_addto(A, x, y, B, row_axis, col_axis, x_vector_axis,
y_vector_axis, ier)
    S3L_array_t        A
    S3L_array_t        x
    S3L_array_t        y
    S3L_array_t        B
    int                row_axis
    int                col_axis
    int                x_vector_axis
    int                y_vector_axis
    int                ier

Input

A - Array handle for an S3L parallel array of rank greater than or equal to 2. Two S3L outer product routines, S3L_outer_prod and S3L_outer_prod_c2, add the contents of this array to the product of xy. All outer product routines use A as the destination array, as described in the Output section.
x - Array handle for an S3L parallel array of rank one less than that of A. It contains one or more instances of the first source vector, x, embedded along axis x_vector_axis.
Axis x_vector_axis of x must have the same length as axis row_axis of A. The remaining axes of x must match the instance axes of A in length and order of declaration. Thus, each vector in x corresponds to a vector in A.
y - Array handle for an S3L parallel array of rank one less than that of A. It contains one or more instances of the second source vector, x, embedded along axis y_vector_axis.
y_vector_axis must have the same length as axis col_axis of A. The remaining axes of y must match the instance axes of A in length and order of declaration. Thus, each vector in y corresponds to a vector in A.
Note: The argument y can be identical to the argument x.
B - Parallel array of the same shape as A. It contains one or more embedded matrices B defined by axes row_axis (which counts the rows) and col_axis (which counts the columns). The remaining axes must match the instance axes of A in length and order of declaration. Thus, each matrix in B corresponds to a matrix in A.
This argument is used only in the S3L_outer_prod_addto and S3L_outer_prod_c2_addto calls, which add each outer product to the corresponding matrix within B and place the result in the corresponding matrix within A. The contents of B are not changed by the operation (unless B and A are the same variable).
Note: For S3L_outer_prod_addto and S3L_outer_prod_c2_addto, the argument B can be identical to the argument A.
row_axis - Scalar integer variable. The axis of A and B that counts the rows of the embedded matrix or matrices. For C/C++ programs, this argument must be nonnegative and less than the rank of A. For F77/F90 programs, it must be greater than zero and less than or equal to the rank of A.
col_axis - Scalar integer variable. The axis of A and B that counts the columns of the embedded matrix or matrices. For C/C++ programs, this argument must be nonnegative and less than the rank of A. For F77/F90 programs, it must be greater than zero and less than or equal to the rank of A.

x_vector_axis - Scalar integer variable that specifies the axis of x along which the elements of the embedded vectors lie. For C/C++ programs, this argument must be nonnegative and less than the rank of y. For F77/F90 programs, it must be greater than zero and less than or equal to the rank of x.

y_vector_axis - Scalar integer variable that specifies the axis of y and v along which the elements of the embedded vectors lie. For C/C++ programs, this argument must be nonnegative and less than the rank of y. For F77/F90 programs, it must be greater than zero and less than or equal to the rank of y.

Output

These functions use the following arguments for output:

A - Array handle for an S3L parallel array of rank greater than or equal to 2, which contains one or more instances of the destination matrix A, defined by axes row_axis (which counts the rows) and col_axis (which counts the columns). Upon successful completion, each matrix instance is overwritten by the result of the outer product call.

ier (Fortran only) - When called from a Fortran program, these functions return error status in ier.

Error Handling

On success, the S3L_outer_prod routines return S3L_SUCCESS.

The S3L_outer_prod routines perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions will cause these functions to terminate and return the associated error code:

S3L_ERR_MATCH_RANK - The parallel arrays do not have the same rank.

S3L_ERR_MATCH_EXTENTS - The lengths of corresponding axes do not match.

S3L_ERR_MATCH_DTYPE - The arguments are not all of the same data type and precision.

S3L_ERR_ARG_AXISNUM - row_axis and/or col_axis contains a bad axis number. For C/C++ program calls, each of these parameters must be nonnegative and less than the rank of A. For F77/F90 calls, they must be greater than zero and lessthan or equal to the rank of A.

S3L_ERR_CONJ_INVAL - Conjugation was requested, but the data supplied was not of type S3L_complex_t or S3L_dcomplex_t.

S3L_ERR_ARG_RANK - Rank of A is less than 2.

Examples

../examples/s3l/dense_matrix_ops/outer_prod.c
../examples/s3l/dense_matrix_ops-f/outer_prod.f

Related Functions

S3L_inner_prod(3)
S3L_2_norm(3)
S3L_mat_vec_mult(3)
S3L_mat_mult(3)

Sparse Matrix Operations

`S3L_declare_sparse`

Description

S3L_declare_sparse creates an internal S3L array handle that describes a sparse matrix. The sparse matrix may be represented in either the Coordinate format or the Compressed Sparse Row (CSR) format. Upon successful completion, S3L_declare_sparse returns an S3L array handle in A that describes the distributed sparse matrix.

The Coordinate format consists of three arrays: a, r, and c. Array a stores the nonzero elements of the sparse matrix in any order. r and c are integer arrays that hold the corresponding row and column indices of the sparse matrix, respectively.

The contents of r, c, and a are supplied by the arguments row, col, and val, respectively. row, col, and val are all rank 1 parallel arrays.

The CSR format stores the sparse matrix in arrays ia, ja, and a. As with the Coordinate format, array a stores the nonzero elements of the matrix. ja, an integer array, contains the column indices of the nonzeros as stored in the array a. ia, also an integer array, contains pointers to the beginning of each row in arrays a and ja.

The ia, ja, and a arrays take their contents from the row, col, and val arguments, respectively. As with the Coordinate format, row, col, and val are all rank 1 parallel arrays.

Note -

Because row, col, and val are copied to working arrays, they can be deallocated immediately following the S3L_declare_sparse call.

S3L_declare_sparse assumes that the row and column indices of the sparse matrix are stored using zero-based indexing when called by C or C++ applications and one-based indexing when called by F77 or F90 applications. See "S3L_read_sparse " for a discussion of S3L_read_sparse.

Syntax

The C and Fortran syntax for S3L_declare_sparse are noted next.

C/C++ Syntax

Example 8-11

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_declare_sparse(A, spfmt, m, n, row, col, val)
    S3L_array_t             *A
    S3L_sparse_storage_t    spfmt
    int                     m
    int                     n
    int                     row
    int                     col
    int                     val

F77/F90 Syntax

Example 8-12

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_declare_sparse(A, spfmt, m, n, row, col, val, ier)
    integer*8               *A
    integer*8               spfmt
    integer*4               m
    integer*4               n
    integer*4               row
    integer*4               col
    integer*4               val 
    integer*4               ier

Input

spfmt - Indicates the sparse storage format used for representing the sparse matrix. Use S3L_SPARSE_COO to specify the Coordinate format and S3L_SPARSE_CSR for the Compressed Sparse Row format.
m - Indicates the total number of rows in the sparse matrix.
n - Indicates the total number of columns in the sparse matrix.
row - Integer parallel array of rank 1. Its length and content can vary, depending on the sparse storage format used.
- S3L_SPARSE_COO - row is of the same size as arrays col and val. and contains row indices of the nonzero elements in array val.
- S3L_SPARSE_CSR- row is of size m+1 and contains pointers to the beginning of each row in arrays col and val.
col - Integer global array of rank 1 with the same length as array val. It contains column indices of the corresponding elements stored in array val.
val - Parallel array of rank 1, containing the nonzero elements of the sparse matrix. For S3L_SPARSE_COO, nonzero elements can be stored in any order. For S3L_SPARSE_CSR, they should be stored row by row, from the first row to the last. The length of val for both S3L_SPARSE_COO and S3L_SPARSE_CSR is, nnz, the total number of nonzero elements in the sparse matrix. The data type of array elements can be real or complex (single- or double-precision).

Output

This function uses the following arguments for output:

A - Upon return, A contains an S3L internal array handle for the global general sparse matrix. This handle can be used in subsequent calls to other S3L sparse array functions.

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_declare_sparse returns S3L_SUCCESS.

The S3L_declare_sparse routine performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions will cause these functions to terminate and return the associated error code:

S3L_ERR_SPARSE_FORMAT - Invalid storage format. It must be either S3L_SPARSE_COO or S3L_SPARSE_CSR.

S3L_ERR_ARG_EXTENTS - Invalid m or n. Each must be > 0.

S3L_ERR_ARG_NULL - Invalid arrays row, col, or val. They must all be preallocated S3L arrays.

S3L_ERR_MATCH_RANK - Ranks of arrays row, col, and val are mismatched. They all must be rank 1 arrays.

S3L_ERR_MATCH_DTYPE - Arrays row and col data types do not match. They must be of type S3L_integer.

S3L_ERR_MATCH_EXTENTS - The lengths of arrays row, col, and val are mismatched. For S3L_SPARSE_COO, they all must be of the same size. For S3L_SPARSE_CSR, the length of array col must be equal to that of array val and array row must be of size m+1.

Examples

../examples/s3l/sparse/ex_sparse2.c
../examples/s3l/dense_matrix_ops-f/outer_prod.f

Related Functions

S3L_matvec_sparse(3)
S3L_rand_sparse(3)
S3L_read_sparse(3)

`S3L_free_sparse`

Description

S3L_free_sparse deallocates the memory reserved for a sparse matrix and the associated array handle.

Syntax

The C and Fortran syntax for S3L_free_sparse are shown below.

C/C++ Syntax

Example 8-13

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_free_sparse(A)
    S3L_array_t             *A

F77/F90 Syntax

Example 8-14

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_free_sparse(A ier)
    integer*8          *A
    integer*4          ier

Input

S3L_free_sparse accepts the following argument as input:

A - Handle for the parallel S3L array that was allocated via a previous call to S3L_declare_sparse, S3L_read_sparse, or S3L_rand_sparse.

Output

S3L_free_sparse uses the following argument for output:

ier (Fortran only) - When called from a Fortran program, S3L_free_sparse returns error status in ier.

Error Handling

On success, S3L_free_sparse returns S3L_SUCCESS.

On error, the following error code may be returned:

S3L_ERR_ARG_ARRAY - A is a NULL pointer (C/C++) or 0 (F77/F90).

Examples

../examples/s3l/sparse/ex_sparse.c
../examples/s3l/sparse/ex_sparse2.c
../examples/s3l/iter/ex_iter.c
../examples/s3l/sparse-f/ex_sparse.f
../examples/s3l/iter-f/ex_iter.f

Related Functions

S3L_declare_sparse(3)
S3L_read_sparse(3)
S3L_rand_sparse(3)

`S3L_rand_sparse`

Description

S3L_rand_sparse creates a random sparse matrix with random sparsity pattern in either the Coordinate format or the Compressed Sparse Row format. Upon successful completion, it returns an S3L array handle in A representing this random sparse matrix.

Syntax

The C and Fortran syntax for S3L_rand_sparse are shown below.

C/C++ Syntax

Example 8-15

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_rand_sparse(A, spfmt, stype, m, n, density, type, seed)
    S3L_array_t               *A
    S3L_sparse_storage_t      spfmt
    sparse_rand_t             stype
    int                       m
    int                       m
    real4                     density
    S3L_data_type             type
    int                       seed

F77/F90 Syntax

Example 8-16

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_rand_sparse(A, spfmt, stype, m, n, density, type, seed, ier)
    integer*8               A
    integer*4               spfmt
    integer*4               stype
    integer*4               m
    integer*4               n
    real*4                  density
    integer*4               type
    integer*4               seed
    integer*4               ier

Input

spfmt - Indicates the sparse storage format used for representing the sparse matrix. Use S3L_SPARSE_COO to specify the Coordinate format and S3L_SPARSE_CSR for the Compressed Sparse Row format.
stype - A character string that specifies the type of random pattern to be used, as follows:
- S3L_SPARSE_RAND - A random pattern.
- S3L_SPARSE_DRND - A random pattern with guaranteed nonzero diagonal.
- S3L_SPARSE_SRND - A random symmetric sparse array.
- S3L_SPARSE_DSRN - A random symmetric sparse array with guaranteed nonzero diagonal.
m - Indicates the total number of rows in the sparse matrix.
n - Indicates the total number of columns in the sparse matrix.
density - Positive parameter less than or equal to 1.0, which suggests the approximate density of the array. For example, if density = 0.1, approximately 10% of the array elements will have nonzero values..
type - The type of the sparse array, which must be one of: S3L_integer, S3L_float, S3L_double, S3L_complex, or S3L_dcomplex.
seed - An integer that is used internally to initialize the random number generators. It affects both the pattern and the values of the array elements. The results are independent of the number of processes on which the function is invoked.
Note: The number of nonzero elements generated will depend primarily on the combination of the density argument value and the array extents given by m and n. The following guidelines provide additional detail:
Usually, the number of nonzero elements will approximately equal m*n*density.The behavior of the algorithm may cause the actual number of nonzero elements to be somewhat smaller than m*n*density.Regardless of the value supplied for the density argument, the number of nonzero elements will always be >= m.

Output

This function uses the following arguments for output:

A - On return, contains an S3L internal array handle for the distributed random sparse matrix. The handle can be used in subsequent calls to some other S3L sparse array functions.

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_rand_sparse returns S3L_SUCCESS.

The S3L_rand_sparse routine performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions will cause this function to terminate and return the associated error code:

S3L_ERR_SPARSE_FORMAT - Invalid storage format. It must be either S3L_SPARSE_COO or S3L_SPARSE_CSR.

S3L_ERR_ARG_EXTENTS - Invalid m or n. Each must be > 0.

S3L_ERR_DENSITY - Invalid density value. It must be 0.0 < density <= 1.0.

S3L_ERR_ARG_OP - Invalid random pattern. It must be one of: S3L_SPARSE_RAND, S3L_SPARSE_DRND, S3L_SPARSE_SRND, or S3L_SPARSE_DSRN.

S3L_ERR_ARRNOTSQ - Invalid matrix size. When stype does not equal S3L_SPARSE_RAND, m must equal n.

Examples

../examples/s3l/iter/ex_iter.c
../examples/s3l/iter-f/ex_iter.f

Related Functions

S3L_declare_sparse(3)
S3L_matvec_sparse(3)
S3L_read_sparse(3)

`S3L_matvec_sparse`

Description

S3L_matvec_sparse computes the product of a global general sparse matrix with a global dense vector. The sparse matrix is described by the S3L array handle A. The global dense vector is described by the S3L array handle x. The result is stored in the global dense vector described by the S3L array handle y.

The array handle A is produced by a prior call to one of the following routines:

S3L_declare_sparse

S3L_read_sparse

S3L_rand_sparse

Syntax

The C and Fortran syntax for S3L_matvec_sparse are shown below.

C/C++ Syntax

Example 8-17

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_matvec_sparse(y, A, x)
    S3L_array_t               y
    S3L_array_t               A
    S3L_array_t               x

F77/F90 Syntax

Example 8-18

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_matvec_sparse(y, A, x, ier)
    integer*8               y
    integer*8               A
    integer*4               x
    integer*4               ier

Input

A - S3L array handle for the global general sparse matrix
x - Global array of rank 1, with the same data type and precision as A and y and with a length equal to the number of columns in the sparse matrix.

Output

These functions use the following arguments for output:

y - Global array of rank 1, with the same data type and precision as A and x and with a length equal to the number of rows in the sparse matrix. Upon completion, y contains the product of the sparse matrix A and x.
ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_matvec_sparse returns S3L_SUCCESS.

The S3L_matvec_sparse routines perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions will cause this function to terminate and return the associated error code:

S3L_ERR_ARG_NULL - Invalid array x or y or sparse matrix A. x and y must be preallocated S3L arrays and A must be a preallocated sparse matrix.

S3L_ERR_ARG_RANK - Invalid rank for arrays x and y. They must be rank 1 arrays.

S3L_ERR_MATCH_RANK - The ranks of x and y do not match.

S3L_ERR_MATCH_DTYPE - Arrays x, y, and A do not have the same data type.

S3L_ERR_MATCH_EXTENTS - The lengths of x and y are mismatched with the size of sparse matrix A. The length of x must be equal to the number of columns in A and the length of y must be equal to the number of rows in A.

Examples

../examples/s3l/sparse/ex_sparse.c
../examples/s3l/sparse-f/ex_sparse.f
../examples/s3l/iter/ex_iter.c
../examples/s3l/iter-f/ex_iter.f

Related Functions

S3L_declare_sparse(3)
S3L_read_sparse(3)
S3L_rand_sparse(3)

`S3L_read_sparse`

Description

S3L_read_sparse reads sparse matrix data from an ASCII file and distributes the data to all participating processes. Upon successful completion, S3L_read_sparse returns an S3L array handle in A that represents the distributed sparse matrix.

S3L_read_sparse supports the following sparse matrix storage formats:

S3L_SPARSE_COO - Coordinate format.

S3L_SPARSE_CSR - Compressed Sparse Row format.

These two formats are described below.

`S3L_SPARSE_COO` - Coordinate Format

S3L_SPARSE_COO files consist of three sections, which are illustrated below and described immediately after.

% <comments>
%
%
m       n       nnz
i1      j1      a(i1, j1)
i1      j1      a(i1, j1)
i1      j1      a(i1, j1)
i1      j1      a(i1, j1)
    :       :       :
innz    jnnz    a(innz, jnnz)

The first section can be used for comments. It consists of one or more lines, each of which begins with the percent "%" character.

The second section consists of a single line containing three integers, shown above as m, n, and nnz. m and n indicate the number of rows and columns of the matrix, respectively, and nnz indicates the total number of nonzero values in the matrix.

The third section lists all nonzero values in the matrix, one value per line. The first two entries on a line are the row and column indices for that value and the third entry is the value itself.

Note -

S3L_read_sparse assumes that row and column indices are stored using zero-based indexing when called by C or C++ applications and one-based indexing when called by F77 or F90 applications.

This is illustrated by the following 4x6 sample matrix.

 3.14       0          0        20.04        0          0
 0         27          0          0         -0.6        0
 0          0         -0.01       0          0          0
-0.031      0          0          0.08       0        314.0

This sample matrix could have the S3L_SPARSE_COO files consist of three sections, which are below and described immediately after.

% Example: 4x6 sparse matrix in an S3L_SPARSE_COO file, 

% row-major order, zero-based indexing:
%
%
4       6       8
0       0       3.140e+00
0       3       2.004e+01
1       1       2.700e+01
1       4      -6.000e-01
2       2      -1.000e-02
3       0      -3.100e-02
3       3       8.000e-02
3       5       3.140e+02

The layout used for this example is row-major, but any order is supported, including random. The next two examples show this same 4x6 matrix stored in two S3L_SPARSE_COO files, both in random order. The first example illustrates zero-based indexing and the second one-based indexing.

% Example: 4x6 sparse matrix in an S3L_SPARSE_COO file, 

% random-major order, zero-based indexing:
%
%
4       6       8
3       5       3.140e+02
1       1       2.700e+01
0       3       2.004e+01
3       3       8.000e-02
2       2      -1.000e-02
0       0       3.140e+00
1       4      -6.000e-01
3       0      -3.100e-02

% Example: 4x6 sparse matrix in an S3L_SPARSE_COO file, 

% random-major order, one-based indexing:
%
%
4       6       8
4       4       8.000e-02
2       2       2.700e+01
1       1       3.140e+00
4       1      -3.100e-02
3       3      -1.000e-02
4       6       3.140e+02
1       4       2.004e+01
2       5      -6.000e-01

MatrixMarket Notes

Under S3L_SPARSE_COO format, S3L_read_sparse can also read data supplied in either of two Coordinate formats distributed by MatrixMarket (http://gams.nist.gov/MatrixMarket/). The two supported MatrixMarket formats are real general and complex general.

MatrixMarket files always use one-based indexing. Consequently, they can only be used directly by Fortran programs, which also implement one-based indexing. For a C or C++ program to use a MatrixMarket file, it must call the F77 application program interface. The program example ex_sparse.c illustrates an F77 call from a C program. See the Examples section for the path to this sample program.

`S3L_SPARSE_CSR` - Compressed Sparse Row Format

The S3L_SPARSE_CSR files also consist of three sections. The first two sections are the same as in S3L_SPARSE_COO files. The third section stores the sparse matrix in the arrays a, ja, and ia. As with S3L_SPARSE_COO, array a stores the nnz elements of the matrix. ja, an integer array, contains the column indices of the nonzeros and ia, also an integer array, contains pointers to the beginning of each row in arrays a and ja.

For example, the same 4x6 sparse matrix used in previous examples could be stored under S3L_SPARSE_CSR in the manner shown in (using zero-based indexing).

% Example: 4x6 sparse matrix in an S3L_SPARSE_CSR file,
% zero-based indexing:
%
%
4       6       8
0    2    4    5    8
0    3    4    1    2    0    5    3
3.140000   200.400000   -0.600000   27.000000
-0.010000   -0.031000   314.000000   0.080000

Syntax

The C and Fortran syntax for S3L_read_sparse are shown below.

C/C++ Syntax

Example 8-19

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_read_sparse(A, spfmt, m, n, nnz, type, fname, dfmt)
    S3L_array_t               *A
    S3L_sparse_storage_t      spfmt
    sparse_rand_t             stype
    int                       m
    int                       m
    int                       nnz
    S3L_data_type             type
    char                      *fname
    char                      *dfmt

F77/F90 Syntax

Example 8-20

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_read_sparse(A, spfmt, m, n, nnz, type, fname, dfmt, ier)
    integer*8               A
    integer*4               spfmt
    integer*4               m
    integer*4               n
    integer*4               nnz
    integer*4               type
    character*1             fname
    character*1             dfmt
    integer*4               ier

Input

spfmt - Specifies the sparse storage format used for representing the sparse matrix. The supported formats are S3L_SPARSE_COO and S3L_SPARSE_CSR.
m - Indicates the total number of rows in the sparse matrix.
n - Indicates the total number of columns in the sparse matrix.
nnz - Indicates the total number of nonzero elements in the sparse matrix.
type - The type of the sparse array, which must be one of: S3L_float, S3L_double, S3L_complex, or S3L_dcomplex.
fname - Scalar character variable that names the ASCII file containing the sparse matrix data.
dfmt - Specifies the format of the data to be read from the data file. The supported format is ASCII.

Output

This function uses the following argument for output:

A - S3L internal array handle for the global general sparse matrix output.

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_read_sparse returns S3L_SUCCESS.

The S3L_read_sparse routine performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions will cause this function to terminate and return the associated error code:

S3L_ERR_ARG_EXTENTS - Invalid m, n, or nnz. These arguments must all be > 0.

S3L_ERR_SPARSE_FORMAT - Invalid storage format. It must be either S3L_SPARSE_COO or S3L_SPARSE_CRS.

S3L_ERR_ARG_DTYPE - Invalid data type. It must be S3L_float, S3L_double, S3L_complex, or S3L_dcomplex.

S3L_ERR_IO_FILENAME - Invalid file name.

S3L_ERR_IO_FORMAT - Invalid data file format. The error could be either of the following:
- The dfmt value supplied was not 'ascii'.
- An unsupported MatrixMarket format was supplied. When a MatrixMarket file is used, the first line of its comment section must contain either the words 'real general' or 'complex general'.

S3L_ERR_FILE_OPEN - Failed to open the data file; the file either does not exist or the name is specified incorrectly.

S3L_ERR_EOF - The input data ends before expected.

Examples

../examples/s3l/sparse/ex_sparse.c
../examples/s3l/sparse-f/ex_sparse.f

Related Functions

S3L_declare_sparse(3)
S3L_matvec_sparse(3)
S3L_rand_sparse(3)

`S3L_print_sparse`

Description

S3L_print_sparse prints all nonzero values of a global general sparse matrix and their corresponding row and column indices to standard output.

For example, the following 4x6 sample matrix

 3.14       0          0        20.04        0          0
 0         27          0          0         -0.6        0
 0          0         -0.01       0          0          0
-0.031      0          0          0.08       0        314.0

could be printed by a C program in the following manner.

4       6       8
0       0       3.14000
0       3       200.040000
1       1       27.000000
1       4      -0.600000
2       2      -0.010000
3       0      -0.031000
3       3       0.080000
3       5       314.000000

Note that, for C-language applications, zero-based indices are used. When S3L_print_sparse is called from a Fortran program,one-basedindices are used, as shown below.

4       6       8
1       1       3.14000
1       4       200.040000
2       2       27.000000
2       5      -0.600000
3       3      -0.010000
4       1      -0.031000
4       4       0.080000
4       6       314.000000

Syntax

The C and Fortran syntax for S3L_print_sparse are shown below.

C/C++ Syntax

Example 8-21

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_print_sparse(A)
    S3L_array_t        A

F77/F90 Syntax

Example 8-22

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_print_sparse(A, ier)
    integer*8          A
    integer*4          ier

Input

A - S3L internal array handle for the global general sparse matrix that is produced by a prior call to one of the following sparse routines:
- S3L_declare_sparse
- S3L_read_sparse
- S3L_rand_sparse

Output

S3L_print_sparse uses the following argument for output:

ier (Fortran only) - When called from a Fortran program, S3L_print_sparse returns error status in ier.

Error Handling

On success, S3L_print_sparse returns S3L_SUCCESS.

The S3L_print_sparse routine performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

On error, it returns the following code.

S3L_ERR_ARG_NULL - The value specified for A is invalid; no such S3L sparse matrix has been defined.

Examples

../examples/s3l/sparse/ex_sparse.c
../examples/s3l/sparse/ex_sparse2.c
../examples/s3l/sparse-f/ex_sparse.f

Related Functions

S3L_declare_sparse(3)
S3L_read_sparse(3)
S3L_rand_sparse(3)

Gaussian Elimination for Dense Systems

`S3l_lu_factor`

Description

For each M x N coefficient matrix A of a, S3L_lu_factor computes the LU factorization using partial pivoting with row interchanges.

The factorization has the form A = P x L x U, where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if M > N), and U is upper triangular (upper trapezoidal if M < N). L and U are stored in A.

In general, S3L_lu_factor performs most efficiently when the array is distributed using the same block size along each axis.

S3L_lu_factor behaves somewhat differently for 3D arrays, however. In this case, it applies nodal LU factorization on each M x N coefficient matrix across the instance axis. This factorization is performed concurrently on all participating processes.

You must call S3L_lu_factor before calling any of the other LU routines. The S3L_lu_factor routine performs on the preallocated parallel array and returns a setup ID. You must supply this setup ID in subsequent LU calls, as long as you are working with the same set of factors.

Be sure to call S3L_deallocate_lu when you have finished working with a set of LU factors. See "S3l_lu_deallocate " for details.

The internal variable setup_id is required for communicating information between the factorization routine and the other LU routines. The application must not modify the contents of this variable.

Syntax

The C and Fortran syntax for S3L_lu_factor are shown below.

C/C++ Syntax

Example 8-23

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_lu_factor(a, row_axis, col_asix, setup_id)
    S3L_array_t               A
    int                       row_axis
    int                       col_axis
    int                       *setup_id
    S3L_data_type             type
    char                      *fname
    char                      *dfmt

F77/F90 Syntax

Example 8-24

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_lu_factor(a, row_axis, col_asix, setup_id, ier)
    integer*8               a
    integer*4               row_axis
    integer*4               col_axis
    integer*4               setup_id
    integer*4               ier

Input

a - Parallel array of rank greater than or equal to 2. This array contains one or more instances of a coefficient matrix A to be factored. Each A is assumed to be dense with dimensions M x N with rows counted by axis row_axis and columns counted by axis col_axis.
row_axis - Scalar integer variable. Identifies the axis of a that counts the rows of each matrix A. For C program calls, row_axis must be >= 0 and less than the rank of a; for Fortran program calls, it must be >= 1 and not exceed the rank of a. In addition, row_axis and col_axis must not be equal.
col_axis - Scalar integer variable. Identifies the axis of a that counts the columns of each matrix A. For C program calls, col_axis must be >= 0 and less than the rank of a; for Fortran program calls, it must be >= 1 and not exceed the rank of a. In addition, row_axis and col_axis must not be equal.

Output

This function uses the following arguments for output:

a - Upon successful completion, each matrix instance A is overwritten with data giving the corresponding LU factors.

setup_id - Scalar integer variable returned by S3L_lu_factor. It can be used when calling other LU routines to reference the LU-factored array.

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_lu_factor returns S3L_SUCCESS.

S3L_lu_factor performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.

The following conditions will cause the function to terminate and return the associated error code:

S3L_ERR_ARG_RANK - Invalid rank; must be >= 2.

S3L_ERR_ARG_BLKSIZE - Invalid blocksize; must be >= 1.

S3L_ERR_ARG_DTYPE - Invalid data type. It must be real or complex (single- or double-precision).

S3L_ERR_ARG_NULL - Invalid array. a must be preallocated.

S3L_ERR_ARG_AXISNUM - row_axis or col_axis is invalid. This condition can be caused by either an out-of-range axis number (see row_axis and col_axis argument definitions) or row_axis equal to col_axis.

S3L_ERR_FACTOR_SING - A singular factor U is returned. If it is used by S3L_lu_solve, division by zero will occur.

Examples

../examples/s3l/lu/lu.c
../examples/s3l/lu/ex_lu1.c
../examples/s3l/lu/ex_lu2.c
../examples/s3l/lu-f/lu.f
../examples/s3l/lu-f/ex_lu1.f

Related Functions

S3L_lu_deallocate(3)
S3L_lu_invert(3)
S3L_lu_solve(3)

`S3l_lu_invert`

Description

S3L_lu_invert uses the LU factorization generated by S3L_lu_factor to compute the inverse of each square (M x M) matrix instance A of the parallel array a. This is done by inverting U and then solving the system A^-1L = U^-1 for A^-1, where A^-1 and U^-1 denote the inverse of A and U, respectively.

In general, S3L_lu_invert performs most efficiently when the array is distributed using the same block size along each axis.

For arrays with rank > 2, the nodal inversion is applied on each of the 2D slices of a across the instance axis and is performed concurrently on all participating processes.

The internal variable setup_id is required for communicating information between the factorization routine and the other LU routines. The application must not modify the contents of this variable.

Syntax

The C and Fortran syntax for S3L_lu_invert are shown below.

C/C++ Syntax

Example 8-25

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_lu_invert(a, setup_id)
    S3L_array_t               a
    int                       setup_id

F77/F90 Syntax

Example 8-26

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_lu_invert(a, setup_id, ier)
    integer*8               a
    integer*4               setup_id
    integer*4               ier

Input

a - Parallel array that was factored by S3L_lu_factor, where each matrix instance A is a dense M x M square matrix. Supply the same value a that was used in S3L_lu_factor.
setup_id - Scalar integer variable. Use the value returned by the corresponding S3L_lu_factor call for this argument.

Output

This function uses the following arguments for output:

a - Upon successful completion, each matrix instance A is overwritten with data giving the corresponding LU factors.

setup_id - Scalar integer variable returned by S3L_lu_factor. It can be used when calling other LU routines to reference the LU-factored array.

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_lu_invert returns S3L_SUCCESS.

S3L_lu_invert performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.

The following conditions will cause the function to terminate and return the associated error code:

S3L_ERR_ARG_NULL - Invalid array; must be the same value returned by S3L_lu_factor.

S3L_ERR_ARG_SETUP - Invalid setup_id.

S3L_ERR_FACTOR_SING - a contains singular factors; its inverse could not be computed.

Examples

../examples/s3l/lu/lu.c
../examples/s3l/lu/ex_lu1.c
../examples/s3l/lu/ex_lu2.c
../examples/s3l/lu-f/lu.f
../examples/s3l/lu-f/ex_lu1.f

Related Functions

S3L_lu_factor(3)
S3L_lu_invert(3)
S3L_lu_solve(3)

`S3l_lu_solve`

Description

For each square coefficient matrix A of a, S3L_lu_solve solves a system of distributed linear equations AX = B, with a general M x M square matrix instance A, using the LU factorization computed by S3L_lu_factor.

Note -

Throughout these descriptions, L^-1 and U^-1 denote the inverse of L and U, respectively.

A and B are corresponding instances within a and b, respectively. To solve AX = B, S3L_lu_solve performs forward elimination:

Let UX = C
A = LU implies that AX = B is equivalent to C = L^-1B

followed by back substitution:

X = U^-1C = U^-1(L^-1B)

To obtain this solution, the S3L_lu_solve routine performs the following steps:

Applies L^-1 to B.

Applies U^-1 to L^-1B.

Upon successful completion, each B is overwritten with the solution to AX = B.

In general, S3L_lu_solve performs most efficiently when the array is distributed using the same block size along each axis.

S3L_lu_solve behaves somewhat differently for 3D arrays, however. In this case, the nodal solve is applied on each of the 2D systems AX=B across the instance axis of a and is performed concurrently on all participating processes.

The input parallel arrays a and b must be distinct.

The internal variable setup_id is required for communicating information between the factorization routine and the other LU routines. The application must not modify the contents of this variable.

Syntax

The C and Fortran syntax for S3L_lu_solve are shown below.

C/C++ Syntax

Example 8-27

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_lu_solve(b, a, setup_id)
    S3L_array_t               b
    S3L_array_t               a
    int                       setup_id

F77/F90 Syntax

Example 8-28

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_lu_solve(b, a, setup_id, ier)
    integer*8               b
    integer*8               a
    integer*4               setup_id
    integer*4               ier

Input

b - Parallel array of the same type (real or complex) and precision as a. Must be distinct from a. The instance axes of b must match those of a in order of declaration and extents. The rows and columns of each B must be counted by axes row_axis and col_axis, respectively (from the S3L_lu_factor call). For the two-dimensional case, if b consists of only one right-hand side vector, you can represent b as a vector (an array of rank 1) or as an array of rank 2 with the number of columns set to 1 and the elements counted by axis row_axis.
a - Parallel array that was factored by S3L_lu_factor, where each matrix instance A is a dense M x M square matrix. Supply the same value a that was used in S3L_lu_factor.
setup_id - Scalar integer variable. Use the value returned by the corresponding S3L_lu_factor call for this argument.

Output

This function uses the following arguments for output:

b - Upon successful completion, each matrix instance B is overwritten with the solution to AX = B.

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_lu_solve returns S3L_SUCCESS.

S3L_lu_solve performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.

The following conditions will cause the function to terminate and return the associated error code:

S3L_ERR_ARG_NULL - Invalid array. b must be preallocated and the same value returned by S3L_lu_factor must be supplied in a.

S3L_ERR_ARG_RANK - Invalid rank. For cases where rank >= 3, rank(b) must equal rank(a). For the two-dimensional case, rank(b) must be either 1 or 2.

S3L_ERR_ARG_DTYPE - Invalid data type; must be real or complex (single- or double-precision).

S3L_ERR_ARG_BLKSIZE - Invalid block size; must be >= 1.

S3L_ERR_MATCH_EXTENTS - Extents of a and b are mismatched along the row or instance axis.

S3L_ERR_MATCH_DTYPE - Unmatched data type between a and b.

S3L_ERR_ARRNOTSQ - Invalid matrix size; each coefficient matrix must be square.

S3L_ERR_ARG_SETUP - Invalid setup_id value. It does not match the value returned by S3L_lu_factor.

Examples

../examples/s3l/lu/lu.c
../examples/s3l/lu/ex_lu1.c
../examples/s3l/lu/ex_lu2.c
../examples/s3l/lu-f/lu.f
../examples/s3l/lu-f/ex_lu1.f

Related Functions

S3L_lu_deallocate(3)
S3L_lu_factor(3)
S3L_lu_invert(3)

`S3l_lu_deallocate`

Description

S3L_lu_deallocate invalidates the specified setup ID, which deallocates the memory that has been set aside for the S3L_lu_factor routine associated with that ID. Attempts to use a deallocated setup ID will result in errors.

When you finish working with a set of factors, be sure to use S3L_lu_deallocate to free up the associated memory. Repeated calls to S3L_lu_factor without deallocation can cause you to run out of memory.

Syntax

The C and Fortran syntax for S3L_lu_deallocate are shown below.

C/C++ Syntax

Example 8-29

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_lu_deallocate(setup_id)
    int                setup_id

F77/F90 Syntax

Example 8-30

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_lu_deallocate(setup_id,
ier)
    integer*4          setup_id
    integer*4          ier

Input

setup_id - Scalar integer variable. Use the value returned by the corresponding S3L_lu_factor call for this argument.

Output

This function uses the following argument for output:

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_lu_deallocate returns S3L_SUCCESS.

The following condition will cause the function to terminate and return the associated error code.

S3L_ERR_ARG_SETUP - Invalid setup_id value. It does not match the value returned by S3L_lu_factor.

Examples

../examples/s3l/lu/lu.c
../examples/s3l/lu/ex_lu1.c
../examples/s3l/lu/ex_lu2.c
../examples/s3l/lu-f/lu.f
../examples/s3l/lu-f/ex_lu1.f

Related Functions

S3L_lu_factor(3)
S3L_lu_solve(3)
S3L_lu_invert(3)

Fast Fourier Transforms

`S3L_fft`

Description

S3L_fft performs a simple FFT on the complex parallel array a. The same FFT operation is performed along all axes of the array.

Both power-of-two and arbitrary radix FFTs are supported. The 1D parallel FFT can be used for sizes that are a multiple of the square of the number of processes. The 2D and 3D FFTs can be used for arbitrary sizes and distributions.

The S3L_fft routine computes a multidimensional transform by performing a one-dimensional transform along each axis in turn.

The sign of the twiddle factor exponents determines the direction of an FFT. Twiddle factors with a negative exponent imply a forward transform, and twiddle factors with positive exponents are used for an inverse transform.

For the 2D FFT, a more efficient transpose algorithm will be used if the blocksizes along each dimension are equal to the extents divided by the number of processes, resulting in significant performance improvements.

S3L_fft (and S3L_ifft) can only be used for complex and double complex data types. To compute a real-data forward FFT, use S3L_rc_fft. This performs a forward FFT on the real data, yielding packed representation of the complex results. To compute the corresponding inverse FFT, use S3L_cr_fft, which will perform an inverse FFT on the complex data, overwriting the original real array with real-valued results of the inverse FFT.

The floating-point precision of the result always matches that of the input.

Note -

S3L_fft and S3L_ifft do not perform any scaling. Consequently, when a forward FFT is followed by an inverse FFT, the original data will be scaled by the product of the extents of the array.

Syntax

The C and Fortran syntax for S3L_fft are shown below.

C/C++ Syntax

Example 8-31

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_fft(a, setup_id)
    S3L_array_t        a
    int                setup_id

F77/F90 Syntax

Example 8-32

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_fft(a, setup_id, ier)
    integer*8          a
    integer*4          setup_id
    integer*4          ier

Input

a - Parallel array that is to be transformed. Its rank, extents, and type must be the same as the parallel array (a) supplied in the S3L_fft_setup call.
setup_id - Scalar integer variable. Use the value returned by the S3L_fft_setup call for this argument.

Output

This function uses the following arguments for output:

a - The input array a is overwritten with the result of the FFT.
ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_fft returns S3L_SUCCESS.

S3L_fft performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.

The following conditions will cause the function to terminate and return the associated error code.

S3L_ERR_FFT_RANKGT3 - The rank of the array a is larger than 3.

S3L_ERR_ARG_NCOMPLEX - Array a is not complex.

S3L_ERR_FFT_EXTSQPROCS - Array a is 1D but its extent is not divisible by the square of the number of processes.

S3L_ERR_ARG_SETUP - The setup_id supplied is not valid.

Examples

../examples/s3l/fft/fft.c
../examples/s3l/fft/ex_fft1.c
../examples/s3l/fft/ex_fft2.c
../examples/s3l/fft-f/fft.f

Related Functions

S3L_fft_setup(3)
S3L_fft_free_setup(3)
S3L_ifft(3)
S3L_fft_detailed(3)
S3L_cr_fft(3)
S3L_rc_fft(3)
S3L_rc_fft_setup(3)

`S3L_fft_detailed`

Description

S3L_fft_detailed computes the in-place forward or inverse FFT along a specified axis of a complex or double complex parallel array, a. FFT direction and axis are specified by the arguments iflag and axis, respectively. Both power-of-two and arbitrary radix FFTs are supported. Upon completion, a is overwritten with the FFT result.

A 1D parallel FFT can be used for array sizes that are a multiple of the square of the number of processes. Higher dimensionality FFTs can be used for arbitrary sizes and distributions.

For the 2D FFT, a more efficient transpose algorithm is employed when the blocksizes along each dimension are equal to the extents divided by the number of processes. This yields significant performance benefits.

S3L_fft_detailed can only be used for complex and double complex data types. To compute a real-data forward FFT, use S3L_rc_fft. This performs a forward FFT on the real data, yielding packed representation of the complex results. To compute the corresponding inverse FFT, use S3L_cr_fft, which will perform an inverse FFT on the complex data, overwriting the original real array with real-valued results of the inverse FFT.

The floating-point precision of the result always matches that of the input.

Note -

S3L_fft_detailed and S3L_ifft do not perform any scaling. Consequently, when a forward FFT is followed by an inverse FFT, the original data will be scaled by the product of the extents of the array.

Syntax

The C and Fortran syntax for S3L_fft_detailed are shown below.

C/C++ Syntax

Example 8-33

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_fft_detailed(a, setup_id, iflag, axis)
    S3L_array_t        a
    int                setup_id
    int                iflag
    int                axis

F77/F90 Syntax

Example 8-34

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_fft_detailed(a, setup_id, iflag, axis, ier)
    integer*8          a
    integer*4          setup_id
    integer*4          iflag
    integer*4          axis
    integer*4          ier

Input

a - Parallel array that is to be transformed. Its rank, extents, and type must be the same as the parallel array (a) supplied in the S3L_fft_setup call.
setup_id - Scalar integer variable. Use the value returned by the S3L_fft_setup call for this argument.
iflag - Determines the transform direction. Set iflag to 1 for forward FFT; set to -1 for inverse FFT.
axis - Determines the axis along which the FFT is to be computed.

Output

This function uses the following arguments for output:

a - The input array a is overwritten with the result of the FFT.

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_fft_detailed returns S3L_SUCCESS.

S3L_fft_detailed performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.

The following conditions will cause the function to terminate and return the associated error code.

S3L_ERR_ARG_NCOMPLEX - Array a is not complex.

S3L_ERR_FFT_EXTSQPROCS - Array a is 1D but its extent is not divisible by the square of the number of processes.

S3L_ERR_ARG_SETUP - The setup_id supplied is not valid.

S3L_ERR_FFT_INVIFLAG - The iflag argument is invalid.

Examples

../examples/s3l/fft/fft.c
../examples/s3l/fft/ex_fft1.c
../examples/s3l/fft/ex_fft2.c
../examples/s3l/fft-f/fft.f

Related Functions

S3L_fft_setup(3)
S3L_fft_free_setup(3)
S3L_ifft(3)
S3L_fft(3)
S3L_cr_fft(3)
S3L_rc_fft(3)
S3L_rc_fft_setup(3)

`S3L_ifft`

Description

Run S3L_ifft to compute the inverse FFT of the complex or double complex parallel array a. Use the setup ID returned by S3L_fft_setup to specify the array of interest.

Both power-of-two and arbitrary radix FFT are supported. The 1D parallel FFT can be used for sizes that are a multiple of the square of the number of nodes; the 2D and 3D FFTs can be used for arbitrary sizes and distributions.

Upon completion, a is overwritten with the result. The floating-point precision of the result always matches that of the input.

For the 2D FFT, if the blocksizes along each dimension are equal to the extents divided by the number of processes, a more efficient transpose algorithm is employed, which yields significant performance improvements.

S3L_ifft can only be used for complex and double complex data types. To compute a real-data forward FFT, use S3L_rc_fft. This performs a forward FFT on the real data, yielding packed representation of the complex results. To compute the corresponding inverse FFT, use S3L_cr_fft, which will perform an inverse FFT on the complex data, overwriting the original real array with real-valued results of the inverse FFT.

Note -

S3L_fft and S3L_ifft do not perform any scaling. Consequently, when a forward FFT is followed by an inverse FFT, the original data will be scaled by the product of the extents of the array.

Syntax

The C and Fortran syntax for S3L_ifft are shown below.

C/C++ Syntax

Example 8-35

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_ifft(a, setup_id)
    S3L_array_t        a
    int                setup_id

F77/F90 Syntax

Example 8-36

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_ifft(a, setup_id, ier)
    integer*8          a
    integer*4          setup_id
    integer*4          ier

Input

a - S3L array handle for a parallel array that will be transformed. Its rank, extents, and type must be the same as the parallel array (a) supplied in the S3L_fft_setup call.
setup_id - Scalar integer variable. Use the value returned by the S3L_fft_setup call for this argument.

Output

This function uses the following arguments for output:

a - The input array a is overwritten with the result of the FFT.
ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_ifft returns S3L_SUCCESS.

S3L_ifft performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.

The following conditions will cause the function to terminate and return the associated error code.

S3L_ERR_FFT_RANKGT3 - The rank of the array a is larger than 3.

S3L_ERR_ARG_NCOMPLEX - Array a is not complex.

S3L_ERR_FFT_EXTSQPROCS - Array a is 1D but its extent is not divisible by the square of the number of processes.

S3L_ERR_ARG_SETUP - The setup_id supplied is not valid.

Examples

../examples/s3l/fft/fft.c
../examples/s3l/fft-f/fft.f

Related Functions

S3L_fft_setup(3)
S3L_fft_free_setup(3)
S3L_fft_detailed(3)

`S3L_rc_fft` and `S3L_cr_fft`

Description

S3L_rc_fft and S3l_cr_fft are used for computing the Fast Fourier Transform of real 1D, 2D, or 3D arrays. S3L_rc_fft performs a forward FFT of a real array and S3l_cr_fft performs the inverse FFT of a complex array with certain symmetry properties. The result of S3l_cr_fft is real.

S3L_rc_fft accepts as input a real (single- or double precision) parallel array and, upon successful completion, overwrites the contents of the real array with the complex Discrete Fourier Transform (DFT) of the data in a packed format.

S3L_cr_fft accepts as input a real array, which contains the packed representation of a complex array.

S3L_rc_fft and S3l_cr_fft have been optimized for cases where the arrays are distributed only along their last dimension. They also work, however, for any CYCLIC(n) array layout.

For the 2D FFT, a more efficient transposition algorithm is used when the blocksizes along each dimension are equal to the extents divided by the number of processors. This arrangement can result in significantly higher performance.

The algorithms used are non-standard extensions of the Cooley-Tuckey factorization and the Chinese Remainder Theorem. Both power-of-two and arbitrary radix FFTs are supported.

The nodal FFTs upon which the parallel FFT is based are mixed radix with prime factors 2, 3, 5, 7, 11, and 13. The parallel FFT will be more efficient when the size of the array is a product of powers of these factors. When the size of an array cannot be factored into these prime factors, a slower DFT is used for the remainder.

Supported Array Sizes

One Dimension: The array size must be divisible by 4 x p², where p is the number of processors.

Two Dimensions: Each of the array lengths must be divisible by 2 x p, where p is the number of processors.

Three Dimensions: The first dimension must be even and must have a length of at least 4. The second and third dimensions must be divisible by 2 x p, where p is the number of processors.

Scaling

The real-to-complex and complex-to-real S3L parallel FFTs do not include scaling of the data. Consequently, for a forward 1D real-to-complex FFT of a vector of length n, followed by an inverse 1D complex-to-real FFT of the result, the original vector is multiplied by n/2.

If the data fits in a single process, a 1D real-to-complex FFT of a vector of length n, followed by a 1D complex-to-real FFT results in the original vector being scaled by n.

For a real-to-complex FFT of a 2D real array of size n x m, followed by a complex-to-real FFT, the original array is scaled by n x m.

Similarly, a real-to-complex FFT applied to a 3D real array of size n x m x k, followed by a complex-to-real FFT, results in the original array being scaled by n x m x k.

Complex Data Packed Representation

1D Real-to-Complex Periodic Fourier Transforms: The periodic Fourier Transform of a real sequence x[i], i=0,...,N-1 is Hermitian (exhibits conjugate symmetry around its middle point).

If X[i],i=0,...,N-1 are the complex values of the Fourier Transform, then

Example 8-37

 X[i] = conj(X[N-i]), i=1,...,N-1       (eq. 1)

Consider for example the real sequence:

Example 8-38

Its Fourier Transform is:

Example 8-39

   X =

   28.0000
   -4.0000 + 9.6569i
   -4.0000 + 4.0000i
   -4.0000 + 1.6569i
   -4.0000
   -4.0000 - 1.6569i
   -4.0000 - 4.0000i
   -4.0000 - 9.6569i

As you can see:

Example 8-40

   X[1] = conj(X[7])
   X[2] = conj(X[6])
   X[3] = conj(X[5])
   X[4] = conj(X[4]) (i.e.,
X[4] is real) 
   X[5] = conj(X[3])
   X[6] = conj(X[2])
   X[7] = conj(X[1])

Because of the Hermitian symmetry, only N/2+1 = 5 values of the complex sequence X need to be calculated and stored. The rest can be computed from (1).

Note that X[0] and X[N/2] are real valued so they can be grouped together as one complex number. In fact S3L stores the sequence X as:

Example 8-41

   X[0]    X[N/2]
   X[1]
   X[2]

   or

   X =
   28.0000 - 4.0000i
   -4.0000 + 9.6569i
   -4.0000 - 4.0000i
   -4.0000 + 1.6569i

The first line in this example represent the real and imaginary parts of a complex number.

To summarize, in S3L, the Fourier Transform of a real-valued sequence of length N (where N is even), is stored as a real sequence of length N. This is equivalent to a complex sequence of length N/2.

2D Fourier Transform: The method used for 2D FFTs is similar to that used for 1D FFTs. When transforming each of the array columns, only half of the data is stored.

3D Real to Hermitian FFT: As with the 1D and 2D FFTs, no extra storage is required for the 3D FFT of real data, since advantage is taken of all possible symmetries. For an array a(M,N,K), the result is packed in complex b(M/2,N,K) array. Hermitian symmetries exist along the planes a(0,:,:) and a(M/2,:,:) and along dimension 1.

See the rc_fft.c and rc_fft.f program examples for illustrations of these concepts. The paths for these online examples are provided at the end of this section.

Syntax

The C and Fortran syntax for S3L_rc_fft and S3L_cr_fft are shown below.

C/C++ Syntax

Example 8-42

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_rc_fft(a, setup_id)
S3L_cr_fft(a, setup_id)
    S3L_array_t        a
    int                setup_id

F77/F90 Syntax

Example 8-43

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_rc_fft(a, setup_id, ier)
S3L_cr_fft(a, setup_id, ier)
    integer*8          a
    integer*4          setup_id
    integer*4          ier

Input

a - S3L array handle for a parallel real array. For S3L_rc_fft, the contents of a are real values. For S3L_cr_fft, they are the packed representation of a complex array. Upon successful completion, both routines overwrite a with the results of the forward or inverse FFT. See the Output section for a discussion of the use of a for output.
setup_id - Scalar integer variable. Use the value returned by the S3L_rc_fft_setup call for this argument.

Output

These functions use the following arguments for output:

a - S3L array handle for a parallel real array. Upon successful completion, S3L_rc_fft overwrites a with the packed representation of the complex result of the forward FFT. S3L_cr_fft overwrites a with the real result of the inverse FFT.
ier (Fortran only) - When called from a Fortran program, ese functions return error status in ier.

Error Handling

On success, S3L_rc_fft and S3L_cr_fft return S3L_SUCCESS.

The following condition will cause these functions to terminate and return the associated error code.

S3L_ERR_ARG_SETUP - The setup_id supplied is not valid.

Examples

../examples/s3l/rc_fft/rc_fft.c
../examples/s3l/rc_fft-f/rc_fft.f

Related Functions

S3L_rc_fft_setup(3)
S3L_rc_fft_free_setup(3)

`S3L_fft_setup`

Description

A call to S3L_fft_setup is the first step in executing Sun S3L Fast Fourier Transforms. You supply it with the parallel array (a) that is to be transformed. It returns a setup value in setup_id, which you use in subsequent calls to other S3L FFT routines.

When calling S3L_fft_setup, you may supply arbitrary values in a; the setup routine neither examines nor modifies the contents of this parallel array. It simply uses its size and type to create the setup object.

The setup ID computed by the S3L_fft_setup call can be used for any parallel arrays that have the same rank, extents, and type as the a argument supplied in the S3L_fft_setup call--but only for such parallel arrays. If a transform is to be performed on two parallel arrays, a and b, identical in rank, extents, and type, then one call to the setup routine suffices, even if transforms are performed on different axes of the two parallel arrays. But if a and b differ in rank, extents, or type, a separate setup call is required for each.

You may have more than one setup ID active at a time; that is, you may call the setup routine more than once before deallocating any setup IDs. For this reason, be careful that you specify the correct setup ID for calls to S3L_fft, S3L_ifft, S3L_fft_detailed, and S3L_fft_free_setup.

The time required to compute the contents of an FFT setup_id structure is substantially longer than the time required to actually perform an FFT. For this reason, and because it is common to perform FFTs on many parallel variables with the same rank, extents, and type, Sun S3L keeps the setup phase and transform phases distinct.

When a is no longer needed, call S3L_fft_free_setup to deallocate the FFT setup_id.

Syntax

The C and Fortran syntax for S3L_fft_setup are shown below.

C/C++ Syntax

Example 8-44

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_fft_setup(a, setup_id)
    S3L_array_t        a
    int                setup_id

F77/F90 Syntax

Example 8-45

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_fft_setup(a, setup_id, ier)
    integer*8          a
    integer*4          setup_id
    integer*4          ier

Input

a - S3L array handle for a parallel array that will be the subject of subsequent transform operations.

Output

This function uses the following argument for output:

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.
setup_id - On output, it contains an integer value that can be used in subsequent calls to S3L_fft, S3L_ifft, S3L_fft_detailed, and S3L_fft_free_setup.

Error Handling

On success, S3L_fft_setup returns S3L_SUCCESS.

S3L_fft_setup performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

The following conditions will cause S3L_fft_setup to terminate and return the associated error code:

S3L_ERR_FFT_RANKGT3 - The rank of array a is larger than 3.

S3L_ERR_ARG_NCOMPLEX - a is not of type S3L_complex or S3L_double_complex.

S3L_ERR__FFT_EXTSQPROCS - a is a 1D array, but its extent is not a multiple of the square of the number of processes over which it was defined.

Examples

../examples/s3l/fft/fft.c
../examples/s3l/fft/ex_fft1.c
../examples/s3l/fft/ex_fft2.c
../examples/s3l/fft-f/fft.f
../examples/s3l/fft-f/ex_fft1.f

Related Functions

S3L_fft(3)
S3L_fft_free_setup(3)
S3L_ifft(3)
S3L_fft_detailed(3)

`S3L_rc_fft_setup`

Description

S3L_rc_fft_setup allocates a real-to-complex FFT setup that includes the twiddle factors necessary for the computation and other internal structures. This setup depends only on the dimensions of the array whose FFT needs to be computed, and can be used both for the forward (real-to-complex) and inverse (complex-to-real) FFTs. Therefore, to compute multiple real-to-complex or complex-to-real Fourier transforms of different arrays whose extents are the same, the S3L_rc_fft_setup function has to be called only once.

Syntax

The C and Fortran syntax for S3L_rc_fft_setup are shown below.

C/C++ Syntax

Example 8-46

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_rc_fft_setup(a, setup_id)
    S3L_array_t        a
    int                setup_id

F77/F90 Syntax

Example 8-47

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_rc_fft_setup(a, setup_id, ier)
    integer*8          a
    integer*4          setup_id
    integer*4          ier

Input

a - S3L array handle for a parallel array that will be the subject of subsequent transform operations.

Output

This function uses the following argument for output:

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

setup_id - On output, it contains an integer value that can be used in subsequent calls to S3L_rc_fft, S3L_cr_fft, and S3L_rc_fft_free_setup calls.

Error Handling

On success, S3L_rc_fft_setup returns S3L_SUCCESS.

The following conditions will cause S3L_rc_fft_setup to terminate and return the associated error code:

S3L_ERR_ARG_RANK - The rank of array a is not 1, 2, or 3.

S3L_ERR_ARG_NREAL - The data type of a is not real.

S3L_ERR_ARG_NEVEN - Some of the extents of a are not even.

S3L_ERR_ARG_EXTENTS - The extents of a are not correct for the rank of a and the number of processors over which a is distributed. This relationship is summarized below:
- If a is 1D, its length must be divisible by 4*sqr(np) where np is the number of processes over which the a is distributed.
- If a is 2D, its extents must both be divisible by 2*np
- If a is 3D, its first extent must be even and its last two extents must both be divisible by 2*np.

Examples

../examples/s3l/rc_fft/rc_fft.c
../examples/s3l/rc_fft-f/rc_fft.f

Related Functions

S3L_rc_fft(3)
S3L_cr_fft(3)
S3L_rc_fft_free_setup(3)

`S3L_fft_free_setup`

Description

S3L_fft_free_setup deallocates internal memory associated with setup_id by a previous call to S3L_fft_setup.

Syntax

The C and Fortran syntax for S3L_fft_free_setup are shown below.

C/C++ Syntax

Example 8-48

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_fft_free_setup(setup_id)
    int                setup_id

F77/F90 Syntax

Example 8-49

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_fft_free_setup(setup_id, ier)
    integer*4          setup_id
    integer*4          ier

Input

setup_id - Scalar integer variable. Use the value returned by the S3L_fft_setup call for this argument.

Output

This function uses the following argument for output:

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_fft_free_setup returns S3L_SUCCESS.

The following condition will cause S3L_fft_free_setup to terminate and return the associated error code:

S3L_ERR_ARG_SETUP - The setup_id supplied does not correspond to a valid FFT setup.

Examples

../examples/s3l/fft/fft.c
../examples/s3l/fft/ex_fft1.c
../examples/s3l/fft/ex_fft2.c
../examples/s3l/fft-f/fft.f
../examples/s3l/fft-f/ex_fft1.f

Related Functions

S3L_fft_setup(3)
S3L_fft(3)
S3L_ifft(3)
S3L_fft_detailed(3)

`S3L_rc_fft_free_setup`

Description

S3L_rc_fft_free_setup deallocates internal memory associated with setup_id by a previous call to S3L_rc_fft_setup.

Syntax

The C and Fortran syntax for S3L_rc_fft_free_setup are shown below.

C/C++ Syntax

Example 8-50

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_rc_fft_free_setup(setup_id)
    int                setup_id

F77/F90 Syntax

Example 8-51

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_rc_fft_free_setup(setup_id, ier)
    integer*4          setup_id
    integer*4          ier

Input

setup_id - Scalar integer variable. Use the value returned by the S3L_rc_fft_setup call for this argument.

Output

This function uses the following argument for output:

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_rc_fft_free_setup returns S3L_SUCCESS.

The following condition will cause S3L_rc_fft_free_setup to terminate and return the associated error code:

S3L_ERR_ARG_SETUP - The setup_id supplied does not correspond to a valid S3L_rc_fft_setup.

Examples

../examples/s3l/rc_fft/rc_fft.c
../examples/s3l/rc_fft-f/rc_fft.f

Related Functions

S3L_rc_fft_setup(3)
S3L_rc_fft(3)

Structured Solvers

`S3L_gen_band_factor`

Description

S3L_gen_band_factor performs the LU factorization of an n x n general banded array with lower bandwidth bl and upper bandwidth bl. The non-zero diagonals of the array should be stored in an S3L array a of size [2*bl+2*bu+1,n].

In the more general case, a can be a multidimensional array, where axis_r and axis_d denote the array axes whose extents are 2*bl+2*bu+1 and n respectively. The format of the array a is described in the following example:

Example:

Consider a 7 x 7 (n=7) banded array with bl = 1, bu = 2. c is the main diagonal, b is the first superdiagonal and a the second. d is the first subdiagonal. The contents of the composite array a used as input to S3L_gen_band_factor should have the following organization:

Example 8-52

 *   *   *   *   *   *   *
 *   *   *   *   *   *   *
 *   *   *   *   *   *   *
 *   *  a0  a1  a2  a3  a4
 *  b0  b1  b2  b3  b4  b5
c0  c1  c2  c3  c4  c5  c6
d0  d1  d2  d3  d4  d5   *

Note that, items denoted by '*' are not referenced.

If a is two-dimensional, S3L_gen_band_factor is more efficient when axis_r is the first axis, axis_d is the second axis, and array a is block-distributed along the second axis. For C programs, the indices of the first and second axes are 0 and 1, respectively. For Fortran programs, the corresponding indices are 1 and 2.

If a has more than two dimensions, S3L_gen_band_factor is most efficient when axes axis_r and axis_d of a are local (that is, are not distributed).

Syntax

The C and Fortran syntax for S3L_gen_band_factor are shown below.

C/C++ Syntax

Example 8-53

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_gen_band_factor(a, bl, bu, factors, axis_r, axis_d)
    S3L_array_t       a
    int               bl
    int               bu
    int               *factors
    int               axis_r
    int               axis_d

F77/F90 Syntax

Example 8-54

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_gen_band_factor(a, bl, bu, factors, axis_r, axis_d, ier)
    integer*4          a
    integer*4         bl
    integer*4         bu
    integer*4         factors
    integer*4         axis_r
    integer*4         axis_d
    integer*4          ier

Input

a - S3L array handle for a real or complex parallel array of size [1+2*bl+2*bl,N].
bl - Lower bandwidth of a.
bu - Upper bandwidth of a.
axis_r - Specifies the row axis along which factorization will occur.
axis_d - Specifies the column axis along which factorization will occur.

Output

This function uses the following arguments for output:

a - Upon successful completion, S3L_gen_band_factor stores the factorization results in a.

factors - Pointer to an internal structure that holds the factorization.

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_gen_band_factor returns S3L_SUCCESS.

S3L_gen_band_factor performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions will cause the function to terminate and return the associated error code:

S3L_ERR_ARG_DTYPE - The type of a is not one of: real, double, complex or double complex.

S3L_ERR_INDX_INVALID - bl or bu value is invalid for either of the following reasons:
- Less than 0 (C/C++) or less than 1 (F77/F90).
- Greater than the extent of a along axis_d.

S3L_ERR_ARG_EXTENTS - The extent of a along axis axis_r is not equal to 2*bl+2*bu+1.

S3L_ERR_ARRTOOSMALL - The extents of a along axis axis_d are such that the block size in a block distribution is less than bu + bl + 1.

S3L_ERR_ARG_AXISNUM - An axis argument is invalid; that is, it is either:
- It is less than 0 (C/C++) or less than 1 (F77/F90).
- It is greater than the rank of the referenced array.
- axis_d is equal to axis_r.

S3L_ERR_BAND_FFAIL - The factorization could not be completed.

Examples

../examples/s3l/band/ex_band.c
../examples/s3l/band-f/ex_band.f

Related Functions

S3L_gen_band_solve(3)
S3L_gen_band_free_factors(3)

`S3L_gen_band_free_factors`

Description

S3L_gen_band_free_factors frees internal memory associated with a banded matrix factorization.

Syntax

The C and Fortran syntax for S3L_gen_band_free_factors are shown below.

C/C++ Syntax

Example 8-55

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_gen_band_free_factors(factors)
    int               *factors

F77/F90 Syntax

Example 8-56

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_gen_band_free_factors(factors, ier)
    integer*4         factors
    integer*4          ier

Input

factors - Pointer to the internal structure that will be freed.

Output

This function uses the following argument for output:

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_gen_band_free_factors returns S3L_SUCCESS.

The following condition will cause S3L_gen_band_free_factors to terminate and return the associated error code:

S3L_ERR_ARG_SETUP - The factors value is invalid.

Examples

../examples/s3l/band/ex_band.c
../examples/s3l/band-f/ex_band.f

Related Functions

S3L_gen_band_solve(3)
S3L_gen_band_factor(3)

`S3L_gen_band_solve`

Description

S3L_gen_band_solve solves a banded system whose factorization has been computed by a prior call to S3L_gen_band_factor.

The factored banded matrix is stored in array a, whose dimensions are 2*bu + 2*bl + 1 x n. The right-hand-side is stored in array b, whose dimensions are n x nrhs.

If a and b have more than two dimensions, axis_r and axis_d refer to those axes of a whose extents are 2*bu + 2*bl + 1 and n, respectively. Likewise, axis_row and axis_col refer to the axes of b with extents n and nrhs.

Array Layout Guidelines

Two-Dimensional Arrays: If a and b are two-dimensional, S3L_gen_band_solve is more efficient when axis_r = 0, axis_d = 1, array a is block distributed along axis 1, axis_row = 0, axis_col = 1 and array b is block distributed along axis 0.

Note that the values cited in the previous paragraph apply to programs using the C/C++ interface--that is, they assume zero-based array indexing. When S3L_gen_band_solve is called from F77 or F90 applications, these values must be increased by one. Therefore, when a and b are two-dimensional and S3L_gen_band_solve is called by a Fortran program, the solver is more efficient when axis_r = 1, axis_d = 2, array a is block distributed along axis 2, axis_row = 1, axis_col = 2 and array b is block distributed along axis 1.

When a and b are two-dimensional and nrhs is greater than 1, the size of a must be such that n is divisible by the number of processors.

Arrays With More Than Two Dimensions: If a and b have more than two dimensions, S3L_gen_band_solve is more efficient when axes axis_r and axis_d of a and axes axis_row and axis_col are local (not distributed).

Syntax

The C and Fortran syntax for S3L_gen_band_solve are shown below.

C/C++ Syntax

Example 8-57

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_gen_band_solve(a, bl, bu, factors, axis_r, axis_d, b, axis_row,
axix_col)
    S3L_array_t       a
    int               bl
    int               bu
    int               *factors
    int               axis_r
    int               axis_d
    S3L_array_t       b
    int               axis_row
    int               axis_col

F77/F90 Syntax

Example 8-58

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_gen_band_solve(a, bl, bu, factors, axis_r, axis_d, b, axis_row,
axis_col, ier)
    integer*4          a
    integer*4         bl
    integer*4         bu
    integer*4         factors
    integer*4         axis_r
    integer*4         axis_d
    integer*8         b
    integer*4         axis_row
    integer*4         axis_col
    integer*4          ier

Input

a - S3L array handle for a real or complex parallel array of size [1+2*bl+2*bu,n].
bl - Lower bandwidth of a.
bu - Upper bandwidth of a.
factors - Pointer to an internal structure that holds the factorization results.
axis_r - Specifies the axis of array a whose extent is 1+2*bl+2*bu+1
axis_d - Specifies the axis of array a whose extent is n.
b - S3L array handle containing the right-hand side of the matrix equation ax=b.

Output

This function uses the following argument for output:

b - On output, b is overwritten by the solution to the matrix equation ax=b.

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_gen_band_solve returns S3L_SUCCESS.

S3L_gen_band_solve performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions will cause the function to terminate and return the associated error code:

S3L_ERR_ARG_DTYPE - The type of a is not one of: real, double, complex or double complex.

S3L_ERR_INDX_INVALID - bl or bu value is invalid for either of the following reasons:
- It is less than 0 (C/C++) or less than 1 (F77/F90).
- It is greater than the extent of a along axis_d.

S3L_ERR_ARG_EXTENTS - The extent of a along axis axis_r is not equal to 2*bl+2*bu+1.

S3L_ERR_ARRTOOSMALL - The extents of a along axis axis_d are such that the block size in a block distribution is less than bu + bl + 1.

S3L_ERR_ARG_AXISNUM - An axis argument is invalid; that is, it is either:
- Less than 0 (C/C++) or less than 1 (F77/F90).
- Greater than the rank of the referenced array
- axis_d is equal to axis_r.

S3L_ERR_MATCH_RANK - The rank of a is not the same as that of b.

S3L_ERR_ARG_SETUP - The factors value does not correspond to a valid setup.

S3L_ERR_MATCH_EXTENTS - The extents of a along axis_d do not equal the extents of b along axis_row or some of the other extents of a and b do not match.

Examples

../examples/s3l/band/ex_band.c
../examples/s3l/band-f/ex_band.f

Related Functions

S3L_gen_band_factor(3)
S3L_gen_band_free_factors(3)

`S3L_gen_trid_factor`

Description

S3L_gen_trid_factor factors a tridiagonal matrix, whose diagonal is stored in vector D. The first upper subdiagonal is stored in U, and the first lower subdiagonal in L.

On return, the integer factors contains a pointer to an internal setup structure that holds the factorization. Subsequent calls to S3L_gen_trid_solve use the value in factors to access the factorization results.

The contents of the vectors D, U, and L may be altered. These altered vectors, along with the factors parameter, have to be passed to a subsequent call to S3L_gen_trid_solve to produce the solution to a tridiagonal system.

D, U, and L must have the same extents and type. If they are one-dimensional, all three must be of length n. The first n-1 entries of U contain the elements of the superdiagonal. The last n-1 entries of L contain the elements of the first subdiagonal. The last element of U and the first element of L are not referenced and can be initialized arbitrarily.

If D, U and L have more than one dimension, axis_d is the axis along which the multidimensional arrays are factored. If they are one-dimensional, axis_d must be 0 in C/C++ programs and 1 in F77/F90 programs.

S3L_gen_trid_factor is based on the ScaLAPACK routines pxdttrf, where x is single, double, complex, or double complex. It does no pivoting; consequently, the matrix has to be positive definite for the factorization to be stable.

For one-dimensional arrays, the routine is more efficient when D, U, and L are block distributed. For multiple dimensions, the routine is more efficient when axis_d is a local axis.

Syntax

The C and Fortran syntax for S3L_gen_trid_factor are shown below.

C/C++ Syntax

Example 8-59

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_gen_trid_factor(D, U, L, factors, axis_d)
    S3L_array_t       D
    S3L_array_t       U
    S3L_array_t       L
    int               *factors
    int               axis_d

F77/F90 Syntax

Example 8-60

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_gen_trid_factor(D, U, L, factors, axis_d, ier)
    integer*8         D
    integer*8         U
    integer*8         L
    integer*4         factors
    integer*4         axis_d
    integer*4          ier

Input

D - Vector containing the diagonal for the matrix being factored.
U - Vector containing the first upper diagonal for the matrix being factored.
L - Vector containing the first lower diagonal for the matrix being factored.
axis_d - When D, U, and L are one-dimensional, axis_d must be 0 (C/C++ programs) or 1 (F77/F90 programs). For multidimensional arrays, axis_d specifies the axis along which the arrays are factored.

Output

This function uses the following arguments for output:

D - On output, D is overwritten with the partial result of the factorization.
U - On output, U is overwritten with the partial result of the factorization.
L - On output, L is overwritten with the partial result of the factorization.
factors - Upon completion, factors points to the internal data structure containing the factorization results.
ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_gen_trid_factor returns S3L_SUCCESS.

S3L_gen_trid_factor performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions will cause the function to terminate and return the associated error code:

S3L_ERR_MATCH_DTYPE - The arrays are not the same data type.

S3L_ERR_MATCH_RANK - The arrays do not have the same rank.

S3L_ERR_MATCH_EXTENTS - The arrays do not have the same extents.

S3L_ERR_ARG_DTYPE - The array type cannot be operated on by the routine (that is, it is integer or long long).

S3L_ERR_ARRTOOSMALL - The array extent is too small, making the length of the main diagonal less than two times the number of processes.

S3L_ERR_ARG_AXISNUM - An axis argument is invalid; that is, it is either:
- Less than 0 (C/C++) or less than 1 (F77/F90).
- Greater than the rank of the referenced array.

S3L_ERR_FACTOR_FAIL - The tridiagonal matrix could not be factored for some reason. For example, it might not be diagonally dominant.

Examples

../examples/s3l/trid/ex_trid.c
../examples/s3l/trid-f/ex_trid.f

Related Functions

S3L_gen_trid_solve(3)
S3L_gen_trid_free_factors(3)

`S3L_gen_trid_free_factors`

Description

S3L_gen_trid_free_factors frees the internal memory setup that was reserved by a prior call to S3L_gen_trid_factor. The factors argument contains the value returned by the earlier S3L_gen_trid_factor call.

Syntax

The C and Fortran syntax for S3L_gen_trid_free_factors are shown below.

C/C++ Syntax

Example 8-61

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_gen_band_free_factors(factors)
    int               *factors

F77/F90 Syntax

Example 8-62

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_gen_band_free_factors(factors, ier)
    integer*4         factors
    integer*4          ier

Input

factors - Pointer to the internal structure that will be freed.

Output

This function uses the following argument for output:

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_gen_trid_free_factors returns S3L_SUCCESS.

The following condition will cause S3L_gen_trid_free_factors to terminate and return the associated error code:

S3L_ERR_ARG_SETUP - The factors value is invalid.

Examples

../examples/s3l/trid/ex_trid.c
../examples/s3l/trid-f/ex_trid.f

Related Functions

S3L_gen_trid_solve(3)
S3L_gen_trid_factor(3)

`S3L_gen_trid_solve`

Description

S3L_gen_trid_solve solves a tridiagonal system that has been previously factored via a call to S3L_gen_trid_factor.

If D, U, and L are of length n, B (the right-hand side of the tridiagonal system) must be of size n x nrhs. If D, U, and L are multidimensional, axis_d is the axis along which the system is solved. The rank of B must be one greater than the rank of D, U, and L.

If the rank of B is greater than 2, row_b and col_b specify the axes whose dimensions are n and nrhs, respectively. The extents of all other axes must be the same as the corresponding axes of D, U, and L.

When computing multiple tridiagonal systems in which only the right-hand-side matrix changes, the factorization routine S3L_gen_trid_factor need only be called once, before the first call to S3l_gen_trid_solve. Then, S3L_gen_trid_solve can be called repeatedly without calling S3L_gen_trid_factor again.

Syntax

The C and Fortran syntax for S3L_gen_trid_solve are shown below.

C/C++ Syntax

Example 8-63

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_gen_trid_solve(D, U, L, factors, B, row_b, col_b)
    S3L_array_t       D
    S3L_array_t       U
    S3L_array_t       L
    int               *factors
    S3L_array_t       B
    int               axis_d
    int               axis_d
    int               row_b

F77/F90 Syntax

Example 8-64

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_gen_trid_solve(D, U, L, factors, B, axis_d, row_b, col_b, ier)
    integer*8         D
    integer*8         U
    integer*8         L
    integer*4         factors
    integer*8         B
    integer*4         axis_d
    integer*4         row_b
    integer*4         col_b
    integer*4          ier

Input

D - Vector containing the diagonal for the matrix being factored.
U - Vector containing the first upper subdiagonal for the matrix being factored.
L - Vector containing the first lower subdiagonal for the matrix being factored.
factors - Pointer to an internal structure that holds the factorization results.
B - The right-hand side of the tridiagonal system to be solved.
axis_d - When D, U, and L are one-dimensional, axis_d must be 0 (C/C++ programs) or 1 (F77/F90 programs). For multidimensional arrays, axis_d specifies the axis along which factorization was carried out.
row_b - Indicates the row axis of the right-hand side array, B. The value of row_b depends on the following:
- When B is two-dimensional and its sides are n x nrhs, row_b is 0 (C/C++) or 1 (F77/F90).
- When B is two-dimensional and its sides are nrhs x n, row_b is 1 (C/C++) or 2 (F77/F90).
- When B has more than two dimensions, row_b identifies the side of B with an extent of n. For C/C++ programs, the row_b value is zero-based and for F77/F90 programs, it is one-based.
- col_b - Indicates the column axis of the right-hand side array, B that has an extent of nrhs. The value of col_b is determined as follows:
  - When B is two-dimensional and its sides are n x nrhs, col_b is 1 (C/C++) or 2 (F77/F90).
  - When B is two-dimensional and its sides are nrhs x n, col_b is 0 (C/C++) or 1 (F77/F90).
  - When B has more than two dimensions, col_b identifies the side of B with an extent of nhrs. For C/C++ programs, the col_b value is zero-based and for F77/F90 programs, it is one-based.

Output

This function uses the following argument for output:

B - On output, B is overwritten with the solution to the tridiagonal system.

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_gen_trid_solve returns S3L_SUCCESS.

S3L_gen_trid_solve performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions will cause the function to terminate and return the associated error code.

S3L_ERR_MATCH_DTYPE - The arrays are not the same data type.

S3L_ERR_MATCH_RANK - The arrays do not have compatible rank.

S3L_ERR_MATCH_EXTENTS - The arrays do not have compatible extents.

S3L_ERR_ARG_DTYPE - The array type cannot be operated on by the routine (that is, it is integer or long long).

S3L_ERR_ARRTOOSMALL - The array extent is too small, making the length of the main diagonal less than two times the number of processes.

S3L_ERR_ARG_AXISNUM - An axis argument is invalid; that is, it is either:
- Less than 0 (C/C++) or less than 1 (F77/F90).
- Greater than the rank of the referenced array.
- row_b is equal to col_b.

S3L_ERR_ARG_SETUP - The factors value does not correspond to a valid setup.

Examples

../examples/s3l/trid/ex_trid.c
../examples/s3l/trid-f/ex_trid.f

Related Functions

S3L_gen_trid_factor(3)
S3L_gen_trid_free_factors(3)

Dense Symmetric Eigenvalue Solver

`S3L_sym_eigen`

Description

S3L_sym_eigen finds selected eigenvalues and, optionally, eigenvectors of Hermitian matrices. The eigenvalues and eigenvectors can be selected by specifying a range of values or a range of indices for the desired eigenvalues/vectors.

Syntax

The C and Fortran syntax for S3L_sym_eigen are shown below.

C/C++ Syntax

Example 8-65

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_sym_eigen(A, axis1, axis2, E, V, J, job, range, limits, tolerances)
    S3L_array_t       A
    int               axis1
    int               axis2
    S3L_array_t       E
    S3L_array_t       V
    S3L_array_t       J
    int               job
    int               range
    void              *limits
    void              *tolerances

F77/F90 Syntax

Example 8-66

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_sym_eigen(A, axis1, axis2, E, V, J, job, range, limits, tolerances,
ier)
    integer*8         A
    integer*4         axis1
    integer*4         axis2
    integer*8         E
    integer*8         V
    integer*8         J
    integer*4         job
    integer*4         range
    <type_lim>        limits(2)
    <type_tol>        tolerances(2)
    integer*4         ier

where <type_lim> is either integer*4 or real*4 and <type_tol> is either real*4 or real*8.

Input

A - S3L array handle describing a real or complex parallel array. On entry, A contains one or more two-dimensional Hermitian matrices, b, each of which is assumed to be dense and square. The axes of b are identified by the arguments axis1 and axis2.

axis1 - Integer variable denoting the axis of A that contains the rows of each Hermitian matrix, b.

axis2 - Integer variable denoting the axis of A that contains the columns of each Hermitian matrix, b. axis2 must be greater than axis1.

job - Integer variable indicating whether or not eigenvectors are to be computed. A value of 0 indicates that only eigenvalues are desired. Otherwise, both eigenvalues and eigenvectors are calculated.

range - Integer variable indicating the range of eigenvalues to be computed, as follows:
- 0 - Return all eigenvalues.
- 1 - Compute all eigenvalues within the specified interval.
- 2 - Return a range of eigenvalue indices (when eigenvalues are sorted in ascending order).

limits - Defines the eigenvalue interval when the value of range is 1 or 2. Specifically, when range equals:
- 0 - limits is not used.
- 1 - limits must be a scalar real vector of length 2. Its values bracket the interval in which eigenvalues are requested--that is, all eigenvalues in the interval [limits(1), limits(2)] will be found.
- 2 - limits must be a scalar integer vector of length 2. For eigenvalues sorted in ascending order, eigenvalues corresponding to limits(1) through limits(2) will be found.

tolerances - Real vector of length 2. Its precision must match that of A. That is, if A is of type S3L_float or S3L_complex, tolerances must be single-precision. If A is of type S3L_double or S3L_double_complex, tolerances must be double-precision.

tolerances(1) gives the absolute error tolerance for the eigenvalues. If tolerances(1) is less than or equal to zero, the value eps * norm(b) will be used in its place, where eps is the machine tolerance and norm(b) is the 1-norm of the tridiagonal matrix obtained by reducing b to tridiagonal form.

tolerances(2) controls the reorthogonalization of eigenvectors. Eigenvectors corresponding to eigenvalues that are within tolerances(2) * norm(b) of each other will be reorthogonalized. If tolerances(1) is less than or equal to zero, the value 1.0e-03 will be used in its place.

Output

This function uses the following arguments for output:

A - Upon exit, the contents of A are destroyed.

E - S3L array handle describing a real parallel array with rank(E) = rank(A) -1. axis1 of E must have the same extent as axis1 of A. The remaining axes are instance axes matching those of A in order of declaration and extents. Thus, each vector f within E corresponds to a matrix b within A.
On return, each f contains the eigenvalues of the corresponding matrix b.

V - S3L array handle describing a parallel array with the same rank, extents, and data type as A. For each instance matrix b within A, there is a corresponding two-dimensional array, w, within V. axis1 denotes the axis of V that contains the rows of w; axis2 denotes the axis of V that contains the columns of w.

On return, each column of w will contain an eigenvector of w.

J - S3L array handle describing an integer parallel array with rank(J) = rank(A) - 1. axis1 of J should have an extent of 2. The remaining axes are instance axes matching those of A in order of declaration and extents. Thus, J will contain vectors of length 2 corresponding to the matrices b embedded within A.

On return, the first element of each vector will contain the number of eigenvalues found. The second element of each vector will contain the number of eigenvectors found.

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_sym_eigen returns S3L_SUCCESS.

S3L_sym_eigen performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions will cause the function to terminate and return the associated error code.

S3L_ERR_ARG_AXISNUM - Invalid value of axis1 or axis2.

S3L_ERR_MATCH_RANK - Ranks of the parallel arrays do not match.

S3L_ERR_ARRNOTSQ - The two-dimensional arrays in A are not square.

S3L_ERR_MATCH_EXTENTS - The extents of the parallel arrays do not match.

S3L_ERR_MATCH_DTYPE - The arguments are not all of the same data type and precision.

S3L_ERR_ARG_RANGE_INV - Invalid value used for range or limits.

S3L_ERR_ARG_NULL - Value of range is 1 or 2 but limits is a NULL pointer (C/C++) or 0 (F77/F90).

Examples

../examples/s3l/eigen/eigen.c
../examples/s3l/eigen-f/engen.f

Parallel Random Number Generators

`S3L_setup_rand_fib`

Description

S3L_setup_rand_fib initializes the Lagged-Fibonacci random number generator's (LFG's) state table with the fixed parameters: l = 17, k = 5, m = 32.

The state table is initialized in a manner that ensures that the random numbers generated for each node are from a different period of the LFG. A Linear Multiplicative Generator (LMG) is used to initialize the noncritical elements of the state table.

Use S3L_free_rand_fib to deallocate an LFG setup.

Syntax

The C and Fortran syntax for S3L_setup_rand_fib are shown below.

C/C++ Syntax

Example 8-67

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_setup_rand_fib(setup_id, seed)
    int                *setup_id
    int                seed

F77/F90 Syntax

Example 8-68

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_setup_rand_fib(setup_id, seed, ier)
    integer*4          setup_id
    integer*4          seed
    integer*4          ier

Input

setup_id - Integer index used to access the state table associated with a particular LFG.
seed - An integer value used to initialize the LMG that initializes the noncritical elements of the LFG's state table.

Output

This function uses the following argument for output:

setup_id - On output, setup_id contains an index that can be used as input to S3L_rand_fib.

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_setup_rand_fib returns S3L_SUCCESS.

Examples

../examples/s3l/rand_fib/rand_fib.c
../examples/s3l/rand_fib-f/rand_fib.f

Related Functions

S3L_free_rand_fib(3)
S3L_rand_fib(3)

`S3L_free_rand_fib`

Description

S3L_free_rand_fib frees the state table associated with a particular Lagged-Fibonacci random number Generator (LFG).

Syntax

The C and Fortran syntax for S3L_free_rand_fib are shown below.

C/C++ Syntax

Example 8-69

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L__rand_fib(setup_id)
    int                *setup_id

F77/F90 Syntax

Example 8-70

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_rand_fib(setup_id, ier)
    integer*4          setup_id
    integer*4          ier

Input

setup_id - Integer index that has been initialized by a call to S3L_setup_rand_fib and is used to identify a particular LFG setup.

Output

This function uses the following argument for output:

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_setup_rand_fib returns S3L_SUCCESS.

On error, the following error code may be returned:

S3L_ERR_ARG_SETUP - The setup_id value does not correspond to a valid setup.

Examples

../examples/s3l/rand_fib/rand_fib.c
../examples/s3l/rand_fib-f/rand_fib.f

Related Functions

S3L_rand_fib(3)
S3L_setup_rand_fib(3)

`S3L_rand_fib`

Description

S3L_rand_fib initializes a parallel array using a Lagged-Fibonacci random number generator (LFG). The LFG's parameters are fixed to l = 17, k = 5, and m = 32.

Random numbers are produced by the following iterative equation:

x[n] = (x[n-e]
+ x[n-k]) mod 2^m

The result of S3L_rand_fib depends on how the parallel array a is distributed.

When the parallel array is of type integer, its elements are filled with nonnegative integers in the range 0 . . . 2³¹ -1. When the parallel array is single- or double-precision real, its elements are filled with random nonnegative numbers in the range 0 . . . 1. For complex arrays, the real and imaginary parts are initialized to random real numbers.

Syntax

The C and Fortran syntax for S3L_rand_fib are shown below.

C/C++ Syntax

Example 8-71

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_rand_fib(a, setup_id, seed)
    S3L_array_t        a
    int                setup_id

F77/F90 Syntax

Example 8-72

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_rand_fib(a, setup_id, seed, ier)
    integer*8          a
    integer*4          setup_id
    integer*4          ier

Input

a - S3L array handle that describes the parallel array to be initialized by the LFG.
setup_id - Integer index used to access the state table associated with the array referenced by a.

Output

This function uses the following argument for output:

a - On output, a is a randomly initialized array.

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_rand_fib returns S3L_SUCCESS.

S3L_rand_fib checks the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following condition will cause the function to terminate and return the associated error code.

S3L_ERR_ARG_SETUP - The setup_id value does not correspond to a valid setup.

Examples

../examples/s3l/rand_fib/rand_fib.c
../examples/s3l/rand_fib-f/rand_fib.f

Related Functions

S3L_free_rand_fib(3)
S3L_setup_rand_fib(3)

`S3L_rand_lcg`

Description

S3L_rand_lcg initializes a parallel array a, using a Linear Congruential random number generator (LCG). It produces random numbers that are independent of the distribution of the parallel array.

Arrays of type S3L_integer (integer4) are initialized to random integers in the range 0 . . . 2³¹-1. Arrays of type S3L_long_integer are initialized with integers in the range 0 . . . 2⁶³-1. Arrays of type S3L_float or S3L_double are initialized in the range 0 . . . 1. The real and imaginary parts of type S3L_complex and S3L_double_complex are also initialized in the range 0 . . . 1.

The random numbers are initialized by an internal iterative equation of the type:

x[n] = a*x[n-1]
+ c

Syntax

The C and Fortran syntax for S3L_rand_lcg are shown below.

C/C++ Syntax

Example 8-73

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_rand_lcg(a, iseed)
    S3L_array_t        a
    int                iseed

F77/F90 Syntax

Example 8-74

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_rand_lcg(a, iseed, ier)
    integer*8          a
    integer*4          iseed
    integer*4          ier

Input

a - S3L array handle that describes the parallel array to be initialized by the LCG.
iseed - An integer. If positive, this value is used as the initial seed for the LCG. If zero or negative, the call to S3L_rand_lcg produces a sequence of random numbers, which is a continuation of a sequence generated in a previous call to S3L_rand_lcg.

Output

This function uses the following argument for output:

a - On output, a is a randomly initialized array.
ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_rand_lcg returns S3L_SUCCESS.

S3L_rand_lcg checks the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following condition will cause the function to terminate and return the associated error code.

S3L_ERR_ARG_RANK - Invalid rank of a.

Examples

../examples/s3l/rand_lcg/rand_lcg.c
../examples/s3l/rand_lcg-f/rand_lcg.f

Related Functions

S3L_free_rand_fib(3)
S3L_setup_rand_fib(3)

Least Squares Solver

`S3L_gen_lsq`

Description

If m >= n, S3L_gen_lsq finds the least squares solution of an overdetermined system. That is, it solves the least squares problem:

Example 8-75

minimize || B - A*X ||

On output, the first n rows of B hold the least squares solution X.

If m < n, S3L_gen_lsq finds the minimum norm solution of an underdetermined system:

Example 8-76

A * X = B(1:m,:)

On output, B holds the minimum norm solution X.

Syntax

The C and Fortran syntax for S3L_gen_lsq are shown below.

C/C++ Syntax

Example 8-77

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_gen_lsq(A, B, axis1, axis2)
    S3L_array_t        A
    S3L_array_t        B
    int                axis1
    int                axis1

F77/F90 Syntax

Example 8-78

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_gen_lsq(A, B, axis1, axis2, ier)
    integer*8          A
    integer*8          B
    integer*4          axis1
    integer*4          axis2
    integer*4          ier

Input

A - S3L array handle that describes a parallel array of dimensions m x n. On output, its contents may be destroyed.
B - S3L array handle that describes a parallel array of dimensions max(m,n) x nrhs. On output, its contents may be destroyed.
axis1 - If A and B have more than two dimensions, axis1 denotes the dimension of A with extent m. Otherwise, it has to be 0 for C/C++ programs or 1 for F77/F90 programs.
axis2 - If A and B have more that two dimensions, axis2 denotes the dimension of A with extent n. Otherwise, it has to be 0 for C/C++ programs or 1 for F77/F90 programs.

Output

This function uses the following argument for output:

B - On output, B is overwritten by the result of the least squares problem.
ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_gen_lsq returns S3L_SUCCESS.

S3L_gen_lsq checks the validity of the array arguments. If an array argument is found to be corrupted or invalid, an error code is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions will cause the function to terminate and return the associated error code.

S3L_ERR_ARG_AXISNUM - An axis argument is invalid; that is, it is either:
- Less than 0 (C/C++) or less than 1 (F77/F90).
- Greater than the rank of the referenced array.
- axis1 is equal to axis2.

S3L_ERR_MATCH_DTYPE - The array arguments are not all of the same type, as required.
S3L_ERR_MATCH_RANK - Corresponding ranks of the array arguments do not match.

S3L_ERR_MATCH_EXTENTS - The extents of the arrays are not compatible.

S3L_ERR_ARG_DTYPE - The array arguments are not float or double, complex, or double precision complex.

Examples

../examples/s3l/lsq/ex_lsq.c
../examples/s3l/lsq-f/ex_lsq.f

Dense Singular Value Decomposition

`S3L_gen_svd`

Description

S3L_gen_svd computes the singular value of a parallel array A and, optionally, the right and/or left singular vectors. On exit, S contains the singular values. If requested, U and V contain the left and right singular vectors, respectively.

If A, U, and V are two-dimensional arrays, S3L_gen_svd is more efficient when A, U and V are allocated on the same process grid and the same block size is used along both axes. When A, U, and V have more than two dimensions, S3L_gen_svd is more efficient when axis_r, axis_c and axis_s are local (that is, are not distributed).

Syntax

The C and Fortran syntax for S3L_gen_svd are shown below.

C/C++ Syntax

Example 8-79

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_gen_svd(A, U, S, V, jobu, jobv, axis_r, axis_c, axis_s)
    S3L_array_t        A
    S3L_array_t        U
    S3L_array_t        S
    S3L_array_t        V
    char               jobu
    char               jobv
    int                axis_r
    int                axis_c
    int                axis_s

F77/F90 Syntax

Example 8-80

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_gen_svd(A, U, S, V, jobu, jobv, axis_r, axis_c, axis_s, ier)
    integer*8          A
    integer*8          U
    integer*8          S
    integer*8          V
    character*1       jobu
    character*1       jobv
    integer*4         axis_r
    integer*4         axis_c
    integer*4         axis_s
    integer*4         ier

Input

A - S3L array handle describing a parallel array of type S3L_double or S3L_float. In the 2D case, A is an m x n array. If A has more than two dimensions, axis_r and axis_c correspond to the axes of A whose extents are m and n, respectively.
U - If jobu = V, U is a parallel array of dimensions m x min(m,n). Otherwise, U is not referred to. If U has more than two dimensions, axis_r and axis_c correspond to the axes of U whose extents are m and n, respectively. On output, U is overwritten with the left singular vectors (see the Output section).
S - S3L array handle describing a parallel array (vector) of length min(m,n). If S is multidimensional, axis_s corresponds to the axis of S whose extent is min(m,n).
V - If jobu = V, this is an S3L array handle describing a parallel array of dimensions min(m,n) x n. Otherwise, V is not referenced. If V has more than two dimensions, axis_r and axis_c correspond to the axes of V whose extents are m and n, respectively. On output, V is overwritten with the right singular vectors (see the Output section).
jobu - Specifies options for computing all or part of the matrix U, as follows:
- jobu = V - The first min(m,n) columns of U (the left singular vectors) are returned in the array U.
- jobu = N - No columns of U (no left singular vectors) are computed.
jobv - Specifies options for computing all or part of the matrix V, as follows:
- jobv = V - The first min(m,n) rows of V (the right singular vectors) are returned in the array V.
- jobv = N - No rows of V (no right singular vectors) are computed.
axis_r - This is the axis of arrays A, U, and V such that the extent of array A along axis_r is m, the extent of array U along axis_r is m, and the extent of array V along axis_r is min(m,n).
axis_c - This is the axis of arrays A, U, and V such that the extent of array A along axis_c is n, the extent of array U along axis_c is min(m,n), and the extent of array V along axis_c is n.
axis_s - This is the axis of array S along which the length is equal to min(m,n).

Output

This function uses the following arguments for output:

U - On output, U is overwritten with the left singular vectors.
S - On output, S is overwritten with the singular values.
V - On output, V is overwritten with the right singular vectors.
ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_gen_svd returns S3L_SUCCESS.

S3L_gen_svd performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions will cause the function to terminate and return the associated error code:

S3L_ERR_ARG_AXISNUM - An axis argument is invalid; that is, it is either:
- Less than 0 (C/C++) or less than 1 (F77/F90).
- Greater than the rank of the referenced array.
- axis_r is equal to axis_c.

S3L_ERR_MATCH_DTYPE - The arrays are not the same data type.

S3L_ERR_MATCH_RANK - The arrays are not the same rank.

S3L_ERR_MATCH_EXTENTS - The extents of the arrays are not compatible.

S3L_ERR_ARG_DTYPE - The data types of the array arguments are not float or double.

S3L_ERR_ARG_OP - jobv is not one of V or N.

S3L_ERR_SVD_FAIL - The svd algorithm failed to converge.

Examples

../examples/s3l/svd/ex_svd.c
../examples/s3l/svd-f/ex_svd.f

Iterative Solver

`S3L_gen_iter_solve`

Description

Given a general square sparse matrix A and a right-hand side vector b, S3L_gen_iter_solve solves the linear system of equations Ax = b, using an iterative algorithm, with or without preconditioning.

The first three arguments to S3L_gen_iter_solve are S3L internal array handles that describe the global general sparse matrix A, the rank 1 global array b, and the rank 1 global array x.

The sparse matrix A is produced by a prior call to one of the following sparse routines:

S3L_declare_sparse

S3L_read_sparse

S3L_rand_sparse

The global rank 1 arrays, b and x, have the same data type and precision as the sparse matrix A and both have a length equal to the order of A.

Two local rank 1 arrays, iparm and rparm, provide user control over various aspects of S3L_gen_iter_solve behavior, including:

Choice of algorithm to be used.

Type of preconditioner to use on A.

Flags to select the initial guess to the solution.

Maximum number of iterations to be taken by the solver.

If restarted GMRES algorithm is chosen, selection of the size of the Krylov subspace.

Tolerance values to be used by the stopping criterion.

If the Richardson algorithm is chosen, selection of the scaling factor to be used.

iparm is an integer array and rparm is a real array. The options supported by these arguments are described in the subsections titled: "Algorithm," "Preconditioning," "Initial Guess," "Maximum Iterations," "Krylov Subspace," "Stopping Criterion Tolerance," and "Richardson Scaling Factor." The "Iteration Termination" subsection identifies the conditions under which S3L_gen_iter_solve will terminate anoperation.

Note -

iparm and rparm must be preallocated and initialized before S3L_gen_iter_solve is called. To enable the default condition for any parameter, set it to 0. Otherwise, initialize them with the appropriate parameter values, as described in the following subsections.

Algorithm

S3L_gen_iter_solve attempts to solve Ax = b using one of the following iterative solution algorithms. The choice of algorithm is determined by the value supplied for the parameter iparm[S3L_iter_solver]. The various options available for this parameter are listed and described in Table 8-12

Table 8-12 iparm[S3L_iter_solver] Options


Option	Description
`S3L_bcgs`	BiConjugate Gradient Stabilized (Bi-CGSTAB)
`S3L_cgs`	Conjugate Gradient Squared (CGS)
`S3L_cg`	Conjugate Gradient (CG)
`S3L_cr`	Conjugate Residuals (CR)
`S3L_gmres`	Generalized Minimum Residual (GMRES) - default
`S3L_qmr`	Quasi-Minimal Residual (QMR)
`S3L_richardson`	Richardson method

Preconditioning

S3L_gen_iter_solve implements left preconditioning. That is, preconditioning is applied to the linear system Ax = b by

Example 8-81

Q^-1 A = Q^-1 b

where Q is the preconditioner and Q^-1 denotes the inverse of Q. The supported preconditioners are listed in Table 8-13.

Table 8-13 iparm[S3L_iter_pc] Options


Option	Description
`S3L_none`	No preconditioning will be done (default).
`S3L_jacobi`	Point Jacobi preconditioner will be used.
`S3L_ilu`	Use a simplified ILU(0); the Incomplete LU factorization of level zero preconditioner. This preconditioner modifies only diagonal nonzero elements of the matrix.

Convergence/Divergence Criteria

The iparm[S3L_iter_conv] parameter selects the criterion to be used for stopping computation. Currently, the single valid option for this parameter is S3L_r0, which selects the default criterion for both convergence and divergence. The convergence criterion is satisfied when:

err = ||rj||_2 / ||r0||_2 < epsilon

and the divergence criterion is met when

err = ||rj||_2 / ||r0||_2 > 10000.0

where:

rj and r0 are the residuals obtained at iterations j and 0.

||.||_2 is the 2-norm.

epsilon is the desired convergence tolerance stored in rparm[S3L_iter_tol].

10000.0 is the divergence tolerance, which is set internally in the solver.

Initial Guess

The parameter iparm[S3L_iter_init] determines the contents of the initial guess to the solution of the linear system as follows:

0 - Applies zero as the initial guess. This is the default.

1 - Applies the value contained in array x as the initial guess. For this case, the user must initialize x before calling S3L_gen_iter_solve.

Maximum Iterations

On input, the iparm[S3L_iter_maxiter] parameter specifies the maximum number of iterations to be taken by the solver. Set to 0 to select the default, which is 10000.

On output, iparm[S3L_iter_maxiter] contains the total number of iterations taken by the solver at the time of termination.

Krylov Subspace

If the restarted GMRES algorithm is selected, iparm[S3L_iter_kspace] specifies the size of the Krylov subspace to be used. The default is 30.

Stopping Criterion Tolerance

On input, rparm[S3L_iter_tol] specifies the tolerance values to be used by the stopping criterion. Its default is 10-8.

On output, rparm[S3L_iter_tol] contains the computed error, err, according to the convergence criteria. See the iparm[S3L_iter_conv] description for details.

Richardson Scaling Factor

If the Richardson method is selected, rparm[S3L_rich_scale] specifies the scaling factor to be used. The default value is 1.0.

Iteration Termination

S3L_gen_iter_solve terminates the iteration when one of the following conditions is met.

The computation has satisfied the convergence criterion.

The computation has diverged.

An algorithmic breakdown has occurred.

The number of iterations has exceeded the supplied value.

Syntax

The C and Fortran syntax for S3L_gen_iter_solve are shown below.

C/C++ Syntax

Example 8-82

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_gen_iter_solve(A, b, x, iparm, rparm)
    S3L_array_t        A
    S3L_array_t        b
    S3L_array_t        x
    int                *iparm
    <type>            *rparm

F77/F90 Syntax

Example 8-83

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_gen_iter_solve(A, b, x, iparm, rparm, ier)
    integer*8          A
    integer*8          b
    integer*8          x
    integer*4         iparm(*)
    <type>            rparm(*)
    integer*4         ier

where <type> is real*4 or real*8 for both C/C++ and F77/F90.

Input

A - S3L internal array handle for the global general sparse matrix. It is produced by a prior call to one of the following sparse routines:
- S3L_declare_sparse
- S3L_read_sparse
- S3L_rand_sparse

b - Global array of rank 1, with the same data type and precision as A and x and a length equal to the order of the sparse matrix. b contains the right-hand side vector of the linear problem.

x - Global array of rank 1, with the same data type and precision as A and b and a length equal to the order of the sparse matrix. On input, x contains the initial guess for the solution to the linear system. Upon successful completion, x contains the converged solution (see the Output section).

iparm - Integer local array of rank 1 and length s3l_iter_iparm_size, where:
- iparm[S3l_iter_solver] - Specifies the iterative algorithm to be used. Set it to 0 to use the default solver GMRES. See the Desctription sectino for details.
- iparm[S3l_iter_pc] - Specifies the preconditioner to be used. Set it to 0 to use the default option, S3L_none.
- iparm[S3l_iter_conv] - Selects the criterion to be used for stopping the computation.

rparm - Specifies options for computing all or part of the matrix U.

Output

This function uses the following arguments for output:

x - Upon successful completion, x contains the converged solution. If the computation breaks down or diverges, x will contain the solution produced by the most recent iteration.

iparm[S3L_iter_maxiter] - On output, contains the total number of iterations taken by the solver at the time of termination.

rparm[S3L_iter_tol] - On output, contains the computed error, err, according to the convergence criteria. See the iparm[S3L_iter_conv] description for details..

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_gen_iter_solve returns S3L_SUCCESS.

S3L_gen_iter_solve performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

On error, it returns one of the following codes, which are organized by error type.

Input Errors

S3L_ERR_ARG_NULL - Invalid array x or b or sparse matrix A. They all must be preallocated S3L arrays or sparse matrix.

S3L_ERR_ARRNOTSQ - Invalid matrix size. Matrix A must be square.

S3L_ERR_ARG_RANK - Invalid rank for arrays x and b. Both must be rank 1 arrays.

S3L_ERR_MATCH_DTYPE - x, b, and A do not have the same data type.

S3L_ERR_MATCH_EXTENTS - The lengths of x and b do not match the size of sparse matrix A. Both must be equal to the order of A.

S3L_ERR_PARAM_INVALID - Invalid input for iparm or rparm. Both must be preallocated and initialized with the predefined values described in the Description section or set to 0 for the default value.

Computational Errors

S3L_ERR_ILU_ZRPVT - Encountered a zero pivot in the course of ILU preconditioning.

S3L_ERR_JACOBI_ZRDIAG - Encountered a zero diagonal in the course of Jacobi preconditioning.

S3L_ERR_DIVERGE - Computation has diverged.

S3L_ERR_ITER_BRKDWN - A breakdown has occurred.

S3L_ERR_MAXITER - The number of iterations has exceeded the value supplied in iparm[S3L_iter_maxiter].

Examples

../examples/s3l/iter/ex_iter.c
../examples/s3l/iter-f/ex_iter.f

Related Functions

S3L_declare_sparse(3)
S3L_read_sparse(3)
S3L_rand_sparse(3)

Autocorrelation

`S3L_acorr_setup`

Description

S3L_acorr_setup sets up the initial conditions necessary for computation of the autocorrelation C = acorr(A). It returns an integer setup value that can be used by subsequent calls to S3L_acorr and S3L_acorr_free_setup.

Syntax

The C and Fortran syntax for S3L_acorr_setup are shown below.

C/C++ Syntax

Example 8-84

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_acorr_setup(a, c, setup_id)
    S3L_array_t        a
    S3L_array_t        c
    int                setup_id
    <type>            *rparm

F77/F90 Syntax

Example 8-85

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_acorr_setup(a, c, setup_id, ier)
    integer*8          a
    integer*8          c
    integer*4         setup_id
    integer*4         ier

where <type> is real*4 or real*8 for both C/C++ and F77/F90.

Input

a - S3L internal array handle for the parallel 1D or 2D array of real or complex type whose autocorrelation is to be computed.
c - S3L internal array handle for the parallel 1D or 2D array of the same type as A, used to store the result of the autocorrelation. Its extents along each axis must be at least equal to two times the corresponding extent of A minus 1.

Output

This function uses the following arguments for output:

setup - Integer value retuned by this function. Use this value for the setup_id argument in subsequent calls to S3_acorr and S3L_acorr_free_setup.

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_acorr_setup returns S3L_SUCCESS.

S3L_acorr_setup performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions cause the function to terminate and return one of the following codes:

S3L_ERR_ARG_DTYPE - The data type of one of the array arguments is invalid. It must be one of:
- S3L_float
- S3L_double
- S3L_complex
- S3L_double_complex

S3L_ERR_MATCH_DTYPE - The array arguments are not all of the same type.

S3L_ERR_MATCH_RANK - The array arguments are not all of the same rank.

S3L_ERR_ARG_RANK - The rank of one of the array arguments is not 1 or 2.

S3L_ERR_ARG_EXTENTS - The extents of c are less than the extents of a.

Examples

../examples/s3l/acorr/ex_acorr.c
../examples/s3l/acorr-f/ex_acorr.f

Related Functions

S3L_acorr(3)
S3L_acorr_free_setup(3)

`S3L_acorr_free_setup`

Description

S3L_acorr_free_setup invalidates the ID specified by the setup_id argument. This deallocates the internal memory that was reserved for the autocorrelation computation associated with that ID.

Syntax

The C and Fortran syntax for S3L_acorr_free_setup are shown below.

C/C++ Syntax

Example 8-86

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_acorr_free_setup(setup_id)
    int                *setup_id

F77/F90 Syntax

Example 8-87

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_acorr_free_setup(setup_id, ier)
    integer*4          setup_id
    integer*4          ier

where <type> is real*4 or real*8 for both C/C++ and F77/F90.

Input

setup_id - Valid autocorrelation setup ID as returned from a previous call to S3L_acorr_setup.

Output

This function uses the following argument for output:

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_acorr_free_setup returns S3L_SUCCESS.

In addition, the following condition causes the function to terminate and return the associated code:

S3L_ERR_ARG_SETUP - Invalid setup_id value.

Examples

../examples/s3l/acorr/ex_acorr.c
../examples/s3l/acorr-f/ex_acorr.f

Related Functions

S3L_acorr(3)
S3L_acorr_setup(3)

`S3L_acorr`

Description

S3L_acorr computes the 1D or 2D autocorrelation of a signal represented by the parallel array described by S3L array handle a. The result is stored in the parallel array described by the S3L array handle c.

a and c are of the same real or complex type.

For the 1D case, if a is of length ma, the result of the autocorrelation will be of length 2*ma-1. In the 2D case, if a is of size [ma,na], the result of the autocorrelation is of size [2*ma-1,2*na-1].

The size of c has to be at least equal to the size of the autocorrelation for each case, as described above. If it is larger, the excess elements of c will contain zero or non-significant entries.

The result of the autocorrelation of a is stored in wrap-around order along each dimension. If the extent of c along a given axis is lc, the autocorrelation at zero lag is stored in c(0), the autocorrelation at lag 1 in c(1), and so forth. The autocorrelation at lag -1 is stored in c(lc-1), the autocorrelation at lag -2 is stored in c(lc-2), and so forth.

Side Effects

Following calculation of the autocorrelation of a, a may be destroyed, since it is used internally as auxiliary storage. If its contents will be reused after autocorrelation is performed, first copy it to a temporary array.

Note -

S3L_acorr is most efficient when all arrays have the same length and when this length is one that can be computed efficiently via S3L_fft, or S3L_rc_fft. See "S3L_fft " and "S3L_rc_fft and S3L_cr_fft " for more information about execution efficiency.

Restriction

The dimensions of array c must be such that a 1D or 2D complex-to-complex FFT or real-to-complex FFT can be computed.

Syntax

The C and Fortran syntax for S3L_acorr are shown below.

C/C++ Syntax

Example 8-88

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_acorr(a, c, setup_id)
    S3L_array_t       a
    S3L_array_t       c
    int               setup_id

F77/F90 Syntax

Example 8-89

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_acorr(a, c, setup_id, ier)
    integer*8         a
    integer*8         c
    integer*4         setup_id
    integer*4         ier

where <type> is real*4 or real*8 for both C/C++ and F77/F90.

Input

a - S3L internal array handle for the parallel array upon which the autocorrelation will be performed. a is of size ma (1D case) or ma x na (2D case).
setup_id - Integer value returned by a previous call to S3L_acorr_setup.

Output

This function uses the following arguments for output:

c - S3L internal array handle for the parallel array that contains the results of the autocorrelation. Its length must be at least 2*ma-1 (1D case) or 2*ma-1 x 2*na-1 (2D case).
ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_acorr_setup returns S3L_SUCCESS.

In addition, the following conditions cause the function to terminate and return one of the following codes:

S3L_ERR_ARG_DTYPE - The data type of one of the array arguments is invalid. It must be one of:
- S3L_float
- S3L_double
- S3L_complex
- S3L_double_complex

S3L_ERR_MATCH_DTYPE - The array arguments are not of the same data type.

S3L_ERR_MATCH_RANK - The array arguments are not of the same rank.

S3L_ERR_ARG_RANK - The rank of one of the array arguments is not 1 or 2 as required.

S3L_ERR_ARG_EXTENTS - The extents of c are smaller than 2*ma-1 (1D case) or 2*ma-1 x 2*na-1 (2D case).

In addition, since S3L_fft or S3L_rc_fft is used internally to compute the autocorrelation, if the dimensions of c are not suitable for S3L_fft or S3L_rc_fft, an error code indicating this unsuitability is returned. For more details, refer to the man pages for S3L_fft and S3L_rc_fft.

Examples

../examples/s3l/acorr/ex_acorr.c
../examples/s3l/acorr-f/ex_acorr.f

Related Functions

S3L_acorr_setup(3)
S3L_acorr_free_setup(3)

Convolution/Deconvolution

`S3L_conv_setup`

Description

S3L_conv_setup sets up the initial conditions necessary for computation of the convolution C = A conv B. It returns an integer setup value that can be used by a subsequent call to S3L_conv.

S3L array handles A, B, and C each describe a parallel array that can be either one- or two-dimensional. The extents of C along each axis i, must be such that they are greater than or equal to two times the sum of the corresponding extents of A and B, minus 1.

Syntax

The C and Fortran syntax for S3L_conv_setup are shown below.

C/C++ Syntax

Example 8-90

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_conv_setup(A, B, C, setup_id)
    S3L_array_t       A
    S3L_array_t       B
    S3L_array_t       C
    int               *setup_id

F77/F90 Syntax

Example 8-91

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_conv_setup(A, B, C, setup_id, ier)
    integer*8         A
    integer*8         B
    integer*8         C
    integer*4         setup_id
    integer*4         ier

where <type> is real*4 or real*8 for both C/C++ and F77/F90.

Input

A - S3L array handle describing a parallel array of size ma (1D case) or ma x na (2D) case. A contains the input signal that will be convolved.
B - S3L array handle describing a parallel array that contains the convolution filter.
C - S3L array handle describing a parallel array in which the convolved signal is stored. Its length must be at least ma+mb-1 (1D case) or ma+mb-1 x na+nb-1 (2D case).

Output

This function uses the following arguments for output:

setup_id - Integer value retuned by this function. Use this value for the setup_id argument in subsequent calls to S3_conv and S3L_conv_free_setup.
ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_conv_setup returns S3L_SUCCESS.

S3L_conv_setup performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions cause the function to terminate and return one of the following codes:

S3L_ERR_ARG_RANK - The rank of one of the array arguments is not 1 or 2.

S3L_ERR_MATCH_RANK - The array arguments are not all of the same rank.

S3L_ERR_MATCH_DTYPE - The array arguments are not all of the same type.

S3L_ERR_ARG_EXTENTS - The extents of c are less two times the sum of the corresponding extents of A and B minus 1.

Examples

../examples/s3l/conv/ex_conv.c
../examples/s3l/conv-f/ex_conv.f

Related Functions

S3L_conv(3)
S3L_conv_free_setup(3)

`S3L_conv_free_setup`

Description

S3L_conv_free_setup invalidates the ID specified by the setup_id argument. This deallocates the internal memory that was reserved for the convolution computation represented by that ID.

Syntax

The C and Fortran syntax for S3L_conv_free_setup are shown below.

C/C++ Syntax

Example 8-92

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_conv_free_setup(setup_id)
    int               *setup_id

F77/F90 Syntax

Example 8-93

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_conv_free_setup(setup_id, ier)
    integer*4         setup_id
    integer*4         ier

where <type> is real*4 or real*8 for both C/C++ and F77/F90.

Input

setup_id - Integer value returned by a previous call to S3L_conv_setup.

Output

This function uses the following argument for output:

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_conv_free_setup returns S3L_SUCCESS.

In addition, the following condition causes the function to terminate and return the associated code:

S3L_ERR_ARG_SETUP - Invalid setup value.

Examples

../examples/s3l/conv/ex_conv.c
../examples/s3l/conv-f/ex_conv.f

Related Functions

S3L_conv(3)
S3L_conv_setup(3)

`S3L_conv`

Description

S3L_conv computes the 1D or 2D convolution of a signal represented by a parallel array using a filter contained in a second parallel array. The result is stored in a third parallel array. These parallel arrays are described by the S3L array handles: a (signal), b (filter), and c (result). All three arrays are of the same real or complex type.

For the 1D case, if the signal a is of length ma and the filter b of length mb, the result of the convolution, c, will be of length ma + mb - 1. In the 2D case, if the signal is of size [ma,na] and the filter is of size [mb,nb], the result of the convolution is of size [ma+mb-1,na+nb-1].

Side Effects

Because a and b are used internally for auxiliary storage, they may be destroyed after the convolution calculation is complete. If the contents of a and b must be used after the convolution, they should first be copied to temporary arrays.

Note -

S3L_conv is most efficient when all arrays have the same length and when this length can be computed efficiently via S3L_fft, or S3L_rc_fft. See "S3L_fft " and "S3L_rc_fft and S3L_cr_fft " for additional information.

Restriction

The dimensions of the array c must be such that the 1D or 2D complex-to-complex FFT or real-to-complex FFT can be computed.

Syntax

The C and Fortran syntax for S3L_conv are shown below.

C/C++ Syntax

Example 8-94

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_conv(a, b, c, setup_id)
    S3L_array_t       a
    S3L_array_t       b
    S3L_array_t       c
    int               *setup_id

F77/F90 Syntax

Example 8-95

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_conv(a, b, c, setup_id, ier)
    integer*8         a
    integer*8         b
    integer*8         c
    integer*4         setup_id
    integer*4         ier

where <type> is real*4 or real*8 for both C/C++ and F77/F90.

Input

a - S3L array handle describing a parallel array of size ma (1D case) or ma x na (2D) case. a is the input signal that will be convolved.
b - S3L array handle describing the parallel array that contains the filter.
setup_id - Valid convolution setup ID as returned from a previous call to S3L_conv_setup.

Output

This function uses the following arguments for output:

c - S3L array handle describing a parallel array containing the convolved signal. Its length must be at least ma+mb-1 (1D case) or ma+mb-1 x na+nb-1 (2D case).
ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_conv returns S3L_SUCCESS.

S3L_conv performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions cause the function to terminate and return one of the following codes:

S3L_ERR_MATCH_DTYPE - a, b, and c do not have the same data type.

S3L_ERR_MATCH_RANK - a, b, and c do not have the same rank.

S3L_ERR_ARG_RANK - The rank of an array argument is larger than 2.

S3L_ERR_ARG_DTYPE - The data type of one of the array arguments is invalid. It must be one of:
- S3L_float
- S3L_double
- S3L_complex
- S3L_double_complex

S3L_ERR_ARG_EXTENTS - The extents of c are smaller than two times the sum of the corresponding extents of a and b minus 1.

Examples

../examples/s3l/conv/ex_conv.c
../examples/s3l/conv-f/ex_conv.f

Related Functions

S3L_conv_setup(3)
S3L_conv_free_setup(3)

`S3L_deconv_setup`

Description

S3L_deconv_setup sets up the initial conditions required for computing the deconvolution of A with B. It returns an integer setup value that can be used by subsequent calls to S3L_deconv or S3L_deconv_free_setup.

Syntax

The C and Fortran syntax for S3L_deconv_setup are shown below.

C/C++ Syntax

Example 8-96

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_deconv_setup(A, B, C, setup_id)
    S3L_array_t       A
    S3L_array_t       B
    S3L_array_t       C
    int               *setup_id

F77/F90 Syntax

Example 8-97

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_deconv_setup(A, B, C, setup_id, ier)
    integer*8         A
    integer*8         B
    integer*8         C
    integer*4         setup_id
    integer*4         ier

where <type> is real*4 or real*8 for both C/C++ and F77/F90.

Input

A - S3L internal array handle for the parallel array that contains the input signal to be deconvolved.
B - S3L internal array handle for the parallel array that contains the vector.
C - S3L internal array handle for the parallel array that will store the deconvolved signal.

Output

This function uses the following arguments for output:

setup_id - Integer value retuned by this function. Use this value for the setup_id argument in subsequent calls to S3_deconv and S3L_deconv_free_setup.
ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_deconv_setup returns S3L_SUCCESS.

S3L_deconv_setup performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions cause the function to terminate and return one of the following codes:

S3L_ERR_ARG_RANK - The rank of one of the array arguments is not 1 or 2.

S3L_ERR_MATCH_RANK - The array arguments are not all of the same rank.

S3L_ERR_MATCH_DTYPE - The array arguments are not all of the same type.

S3L_ERR_ARG_EXTENTS - The extents of C are less than the corresponding extents ext(A) - ext(B) + 1, or the extents of A are less than the corresponding extents of B.

Examples

../examples/s3l/deconv/ex_deconv.c
../examples/s3l/deconv-f/ex_deconv.f

Related Functions

S3L_deconv(3)
S3L_deconv_free_setup(3)

`S3L_deconv_free_setup`

Description

S3L_deconv_free_setup invalidates the ID specified by the setup_id argument. This deallocates internal memory that was reserved for the deconvolution computation represented by that ID.

Syntax

The C and Fortran syntax for S3L_deconv_free_setup are shown below.

C/C++ Syntax

Example 8-98

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_deconv_free_setup(setup_id)
    int               *setup_id

F77/F90 Syntax

Example 8-99

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_deconv_free_setup(setup_id, ier)
    integer*4         setup_id
    integer*4         ier

where <type> is real*4 or real*8 for both C/C++ and F77/F90.

Input

setup_id - Integer value returned by a previous call to S3L_deconv_setup.

Output

This function uses the following argument for output:

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_deconv_free_setup returns S3L_SUCCESS.

In addition, the following condition causes the function to terminate and return the associated code:

S3L_ERR_ARG_SETUP - Invalid setup value.

Examples

../examples/s3l/deconv/ex_deconv.c
../examples/s3l/deconv-f/ex_deconv.f

Related Functions

S3L_deconv(3)
S3L_deconv_setup(3)

`S3L_deconv`

Description

If a can be expressed as the convolution of an unknown vector c with b, S3L_deconv deconvolves the vector b out of a. The result, which is returned in c, is such that conv(c,b)=a.

In the general case, c will only represent the quotient of the polynomial division of a by b.

The remainder of that division can be obtained by explicitly convolving with b and subtracting the result from a.

If ma, mb, and mc are the lengths of a, b, and c respectively, ma must be at least equal to mb. The length of mc will be such that mc+mb-1=ma or, equivalently, mc=ma-mb+1.

Note -

S3L_deconv is most efficient when all arrays have the same length and when this length is such that it can be computed efficiently by S3L_fft or S3L_rc_fft. See "S3L_fft " and "S3L_rc_fft and S3L_cr_fft " for additional information.

Restriction

The dimensions of the array c must be such that the 1D or 2D complex-to-complex FFT or real-to-complex FFT can be computed.

Scaling

The results of the deconvolution are scaled according to the underlying FFT that is used. In particular, for multiple processes, if a and b are real 1D, the result is scaled by n/2, where n is the length of c. For single processes, it is scaled and by n. In all other cases, the result is scaled by the product of the extents of c.

Side Effects

Because a and b are used internally for auxiliary storage, they may be destroyed after the deconvolution calculation is complete. If a and b must be used after the deconvolution, they should first be copied to temporary arrays.

Syntax

The C and Fortran syntax for S3L_deconv are shown below.

C/C++ Syntax

Example 8-100

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_deconv(a, b, c, setup_id)
    S3L_array_t       a
    S3L_array_t       b
    S3L_array_t       c
    int               *setup_id

F77/F90 Syntax

Example 8-101

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_deconv(a, b, c, setup_id, ier)
    integer*8         a
    integer*8         b
    integer*8         c
    integer*4         setup_id
    integer*4         ier

where <type> is real*4 or real*8 for both C/C++ and F77/F90.

Input

a - S3L array handle describing a parallel array that contains the convolution of an unknown vector c with b. Its length must be at least ma+mb-1 (1D case) or ma+mb-1 x na+nb-1 (2D case).
b - S3L array handle describing the parallel array that contains the vector.
setup_id - Valid convolution setup ID as returned from a previous call to S3L_deconv_setup.

Output

This function uses the following arguments for output:

c - S3L array handle describing a parallel array. Its length must be at least ma+mb-1 (1D case) or ma+mb-1 x na+nb-1 (2D case). Upon successful completion, the results of deconvolving a will be stored in c.
ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_deconv returns S3L_SUCCESS.

S3L_deconv performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following conditions cause the function to terminate and return one of the following codes:

S3L_ERR_MATCH_DTYPE - a, b, and c do not have the same data type.

S3L_ERR_MATCH_RANK - a, b, and c do not have the same rank.

S3L_ERR_ARG_RANK - The rank of an array argument is larger than 2.

S3L_ERR_ARG_DTYPE - The data type of one of the array arguments is invalid. It must be one of:
- S3L_float
- S3L_double
- S3L_complex
- S3L_double_complex

S3L_ERR_ARG_EXTENTS - The extents of c are smaller than two times the sum of the corresponding extents of a and b minus 1.

In addition, since S3L_fft or S3L_rc_fft is used internally to compute the deconvolution, if the dimensions of c are not appropriate for using S3L_fft or S3L_rc_fft, an error code indicating the unsuitability is returned. See "S3L_fft " and "S3L_rc_fft and S3L_cr_fft " for more details.

Examples

../examples/s3l/deconv/ex_deconv.c
../examples/s3l/deconv-f/ex_deconv.f

Related Functions

S3L_deconv_setup(3)
S3L_deconv_free_setup(3)

Multidimensional Sort and Grade

`S3L_grade_down`, `S3L_grade_up`, `S3L_grade_down_detailed`, `S3L_grade_up_detailed`

Description

The S3L_grade family of functions computes the grade of the elements of a parallel array A. Grading is done in either descending or ascending order and is done either across the whole array or along a specified axis. The graded elements are stored in array G, using zero-based indexing when called from a C or C++ program and one-based indexing when called from an F77 or F90 program.

`S3L_grade_down` and `S3L_grade_up`

These two functions grade the elements across the entire array A and store the indices of the elements in descending or ascending order (S3L_grade_down or S3L_grade_up, respectively).

If A is an array of rank n and the product of its extents is l, G is a two-dimensional array whose extents are n x l.

Upon return of the function, every j-th column of array G is set to the indices of the j-th smallest (S3L_grade_down) or largest (S3L_grade_up) element of array A.

For example, if A is the 3 x 3 array

Example 8-102

   _         _
  | 6   2   4 |
  |           |
  | 1   3   8 |
  |           |
  | 9   7   5 |
   -         -

and S3L_grade_down is called from a C program, it will store the following values in G.

Example 8-103

   _                         _
  | 2  1  2  0  2  0  1  0  1 |
  |                           |
  | 0  2  1  0  2  2  1  1  0 |
   -                         -

For the same array A, S3L_grade_up would store the following values in G (again, using zero-based indexing).

Example 8-104

   _                         _
  | 1  0  1  0  2  0  2  1  2 |
  |                           |
  | 0  1  1  2  2  0  1  2  0 |
   -                         -

When called by a Fortran program (F77/F90) each value in G would be one greater. For example, S3L_grade_up would store the following set of values.

Example 8-105

   _                         _
  | 2  1  2  1  3  1  3  2  3 |
  |                           |
  | 1  2  2  3  3  1  2  3  1 |
   -                         -

`S3L_grade_detailed_down` and `S3L_grade_detailed_up`

The S3L_grade_detailed_down and S3L_grade_detailed_up functions differ from S3L_grade_down and S3L_grade_up in two respects:

Both grade along a single axis of A, as specified by the axis argument.

Both store a set of indices, but these indices do not indicate element positions directly. Instead, each stored index indicates the index of the corresponding element of A that has either
- The j-th smallest value along the specified axis - S3L_grade_detailed_down
- The j-th largest value along the specified axis - S3L_grade_detailed_up

This means G is an integer array whose rank and extents are the same as those of A.

Repeating the 3 x 3 sample array shown above,

Example 8-106

   _         _
  | 6   2   4 |
  |           |
  | 1   3   8 |
  |           |
  | 9   7   5 |
   -         -

if S3_grade_detailed_down is called from a C program with the axis argument = 0, upon completion, G will contain the following values:

Example 8-107

   _         _
  | 1   2   2 |
  |           |
  | 2   1   0 |
  |           |
  | 0   0   1 |
   -         -

If, instead, axis = 1, G will contain

Example 8-108

   _         _
  | 0   2   1 |
  |           |
  | 2   1   0 |
  |           |
  | 0   1   2 |
   -         -

If S3L_grade_detailed_up is called from a C program with axis = 0, G will contain

Example 8-109

   _         _
  | 1   0   0 |
  |           |
  | 0   1   2 |
  |           |
  | 2   2   1 |
   -         -

If S3L_grade_detailed_up is called from a C program with axis = 1, G will contain

Example 8-110

   _         _
  | 2   0   1 |
  |           |
  | 0   1   2 |
  |           |
  | 2   1   0 |
   -         -

For F77 or F90 calls, each index value in these examples, including the axis argument, would be increased by 1.

Syntax

The C and Fortran syntax for these functions are shown below.

C/C++ Syntax

Example 8-111

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_grade_up(A, grade)
S3L_grade_down(A, grade)
S3L_grade_up_detailed(A, grade, axis)
S3L_grade_down_detailed(A, grade, axis)
    S3L_array_t       A
    S3L_array_t       grade
    S3L_array_t       axis

F77/F90 Syntax

Example 8-112

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_grade_up(A, grade, ier)
S3L_grade_down(A, grade, ier)
S3L_grade_up_detailed(A, grade, axis, ier)
S3L_grade_down_detailed(A, grade, axis, ier)
    integer*8         A
    integer*8         grade
    integer*8         axis
    integer*4         ier

where <type> is real*4 or real*8 for both C/C++ and F77/F90.

Input

A - S3L internal array handle for the array to be graded. Its type can be real, double, integer, or long integer.
axis - The axis along which S3L_grade_detailed_down or S3L_grade_detailed_up is to be computed. It may not be used in S3L_grade_down or S3L_grade_up calls.

Output

These functions use the following arguments for output:

grade - S3L internal array handle for an integer array. Upon successful completion, grade contains the indices of the order of the elements.
ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, these functions return S3L_SUCCESS.

These functions perform generic checking of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following condition will cause the functions to terminate and return the associated code:

S3L_ERR_ARG_AXISNUM - The axis argument has an invalid value. The correct values for axis are
- 0 <= axis < rank of a (C/C++)
- 0 < axis <= rank of a (F77/F90)

Examples

../examples/s3l/grade/ex_grade.c
../examples/s3l/grade-f/ex_grade.f

Related Functions

S3L_sort(3)
S3L_sort_detailed_up(3)
S3L_sort_detailed_down(3)

`S3L_sort`, `S3L_sort_up`, `S3L_sort_down`, `S3L_sort_detailed_up`, `S3L_sort_detailed_down`

Description

The S3L_sort function sorts the elements of a one-dimensional array in ascending order.

S3L_sort_up and S3L_sort_down sort the elements of one-dimensional or multidimensional array in ascending and descending order, respectively.

Note -

S3L_sort is a special case of S3L_sort_up.

When A is one-dimensional, the result is a vector that contains the same elements as A, but arranged in ascending order (S3L_sort or S3L_sort_up) or descending order. For example, if A contains

Example 8-113

   _                                 _
  | 7   2   4   3   1   8   6   9   5 |
   -                                 -

calling S3L_sort or S3L_sort_up would produce the result

Example 8-114

   _                                 _
  | 1   2   3   4   5   6   7   8   9 |
   -                                 -

If A is multidimensional, the elements are sorted into an index-based sequence, starting with the first row-column index and progressing through the row indices first before advancing to the next column index position.

For example if A contains

Example 8-115

   _         _
  | 6   2   7 |
  |           |
  | 1   4   3 |
  |           |
  | 9   5   8 |
   -         -

S3L_sort_up would produce the result

Example 8-116

   _         _
  | 1   4   7 |
  |           |
  | 2   5   8 |
  |           |
  | 3   6   9 |
   -         -

and S3L_sort_down would produce the result

Example 8-117

   _         _
  | 9   6   3 |
  |           |
  | 8   5   2 |
  |           |
  | 7   4   1 |
   -         -

S3L_sort_detailed_up and S3L_sort_detailed_down sort the elements of one-dimensional or multidimensional arrays in ascending and descending order along the axis specified by the axis argument.

Note -

The value of the axis argument is language dependent. For C/C++ applications, it must be zero-based and for F77/F90 applications, it must be one-based.

If the array referenced by A contains

Example 8-118

   _         _
  | 6   2   7 |
  |           |
  | 1   4   3 |
  |           |
  | 9   5   8 |
   -         -

and a C program calls S3L_sort_detailed_up with axis = 0, upon completion, A will contain

Example 8-119

   _         _
  | 1   2   3 |
  |           |
  | 6   4   7 |
  |           |
  | 9   5   8 |
   -         -

Or, if a C program calls S3L_sort_detailed_up with axis = 1, upon completion, A will contain

   _         _
  | 2   6   7 |
  |           |
  | 1   3   4 |
  |           |
  | 5   8   9 |
   -         -

If these calls were made from an F77 or F90 program, the axis values would need to be one greater (that is, 1 and 2, respectively) to achieve the same results.

Syntax

The C and Fortran syntax for these functions are shown below.

C/C++ Syntax

Example 8-120

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_sort(A)
S3L_sort_up(A)
S3L_sort_down(A)
S3L_sort_detailed_up(A, axis)
S3L_sort_detailed_down(A, axis)
    S3L_array_t       A
    int               axis

F77/F90 Syntax

Example 8-121

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_sort(A, ier)
S3L_sort_up(A, ier)
S3L_sort_down(A, ier)
S3L_sort_detailed_up(A, axis, ier)
S3L_sort_detailed_down(A, axis, ier)
    integer*8         A
    integer*4         axis
    integer*4         ier

where <type> is real*4 or real*8 for both C/C++ and F77/F90.

Input

A - For S3L_sort, A must be a one-dimensional array. For S3L_sort_up, S3L_sort_down, S3L_sort_detailed_up, and S3L_sort_detailed_down, A can be one-dimensional or multidimensional.
axis - Used with S3L_sort_detailed_up and S3L_sort_detailed_down to specify which axis of A is to be sorted. If A is one-dimensional, axis must be zero (for C/C++) or 1 (for F77/F90). It may not be used in S3L_sort, S3L_sort_up, or S3L_sort_down calls.

Output

These functions use the following arguments for output:

A - On output, A contains the sorted array.

ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, these functions return S3L_SUCCESS.

These functions all check the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following condition will cause the functions to terminate and return the associated code:

S3L_ERR_ARG_DTYPE - The type of the array is invalid. It must be one of: S3L_integer, S3L_long_integer, S3L_float or S3L_double.

S3L_ERR_ARG_AXISNUM - The axis argument has an invalid value. The correct values for axis are
- 0 <= axis < rank of a (C/C++)
- 0 < axis <= rank of a (F77/F90)

Examples

../examples/s3l/sort/sort1.c
../examples/s3l/sort/ex_sort2.c
../examples/s3l/sort-f/sort1.f

Related Functions

S3L_grade_up(3)
S3L_grade_detailed_down(3)
S3L_grade_detailed_up(3)

Parallel Transpose

`S3L_trans`

Description

S3L_trans performs a generalized transposition of a parallel array. A generalized transposition is defined as a general permutation of the axes. The array axis_perm contains a description of the permutation to be performed.

The distribution characteristics of a and b must be compatible--that is, they must have the same rank and type and corresponding axes must be of the same length.

A faster algorithm is used in the 2D case when the array meets the following conditions:

The first axis of the array is local.

The second axis of the array is global.

The size of each dimension is divisible by the number of processes.

The blocksizes are equal to the result of the division.

Syntax

The C and Fortran syntax for S3L_trans are shown below.

C/C++ Syntax

Example 8-122

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_trans(a, b, axis_perm)
    S3L_array_t       a
    S3L_array_t       b
    int               *axis_perm

F77/F90 Syntax

Example 8-123

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_trans(a, b, axis_perm, ier)
    integer*8         a
    integer*8         b
    integer*4         axis_perm
    integer*4         ier

where <type> is real*4 or real*8 for both C/C++ and F77/F90.

Input

a - S3L_array handle for the parallel array to be transposed.
axis_perm - A vector of integers that specifies the axis permutation to be performed.

Output

These functions use the following arguments for output:

b - S3L_array handle for a parallel array. Upon successful completion, S3L_trans stores the transposed array in b.
ier (Fortran only) - When called from a Fortran program, this function returns error status in ier.

Error Handling

On success, S3L_trans returns S3L_SUCCESS.

S3L_trans checks the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

In addition, the following condition will cause the function to terminate and return the associated code:

S3L_ERR_MATCH_RANK - The ranks of a and b do not match.

S3L_ERR_MATCH_EXTENTS - The extents of a and b are not compatible with the transpose operation requested. That is, the following relationship is not satisfied for all array axes i.

ext(a,axis_perm[i])=ext(b,i)

S3L_ERR_TRANS_PERMAX - The supplied permutation is not valid (every axis must appear exactly once).

S3L_ERR_ARG_AXISNUM - The axis argument has an invalid value. The correct values for axis are
- 0 <= axis < rank of the array (C/C++)
- 0 < axis <= rank of the array (F77/F90)

Examples

../examples/s3l/transpose/transp.c
../examples/s3l/transpose/ex_trans1.c
../examples/s3l/transpose-f/transp.f

Chapter 8 Sun S3L Core Library Functions

Dense Matrix Routines

S3L_2_norm and S3L_gbl_2_norm

Description

Syntax

C/C++ Syntax

Example 8-1

F77/F90 Syntax

Example 8-2

Input

Output

Error Handling

Examples

Related Functions

S3L_inner_prod and S3_gbl_inner_prod

Description

Syntax

C/C++ Syntax

Example 8-3

F77/F90 Syntax

Example 8-4

Input

Output

Error Handling

Examples

Related Functions

S3L_mat_mult

Description

Syntax

C/C++ Syntax

Example 8-5

F77/F90 Syntax

Example 8-6

Input

Output

Error Handling

Examples

Related Functions

S3L_mat_vec_mult

Description

Syntax

C/C++ Syntax

Example 8-7

F77/F90 Syntax

Example 8-8

Input

Output

Error Handling

Examples

Related Functions

S3L_outer_prod

Description

Syntax

C/C++ Syntax

Example 8-9

F77/F90 Syntax

Example 8-10

Input

Output

Error Handling

Examples

Related Functions

Sparse Matrix Operations

S3L_declare_sparse

Description

Syntax

C/C++ Syntax

Example 8-11

F77/F90 Syntax

Example 8-12

Input

Output

Error Handling

Examples

Related Functions

S3L_free_sparse

Description

Syntax

C/C++ Syntax

Example 8-13

`S3L_2_norm` and `S3L_gbl_2_norm`

`S3L_inner_prod` and `S3_gbl_inner_prod`

`S3L_mat_mult`

`S3L_mat_vec_mult`

`S3L_outer_prod`

`S3L_declare_sparse`

`S3L_free_sparse`

`S3L_rand_sparse`

`S3L_matvec_sparse`

`S3L_read_sparse`

`S3L_SPARSE_COO` - Coordinate Format

`S3L_SPARSE_CSR` - Compressed Sparse Row Format

`S3L_print_sparse`

`S3l_lu_factor`