This chapter contains general information about the Sun Scalable Scientific Subroutine Library (Sun S3L).
Sun S3L provides a set of parallel and scalable functions and tools widely used in scientific and engineering computing. It can be used on all Sun HPC Systems, from a single processor on an SMP, through multiple processors on a stand-alone SMP, to a cluster of SMPs.
The chief advantages offered by Sun S3L are summarized below.
Sun S3L is optimized for Sun HPC Systems.
Sun S3L functions have a simple array syntax interface that is callable from message-passing programs written in C, C++, F77, or F90.
Sun S3L supports multiple instances.
Sun S3L is thread safe.
Sun S3L uses the Sun Performance Library(TM) for nodal computation.
Extensive and detailed programming examples are provided online.
Sun S3L is supported by Sun.
Sun S3L includes built-in diagnostics.
Sun S3L uses array handles to provide array syntax support to message-passing programs. Array handles, which are closely analogous to the array descriptors found in the public domain packages ScaLAPACK and PETSc, facilitate argument passing by encapsulating information about distributed arrays.
Sun S3L operates on multidimensional arrays of up to 32 dimensions. This means it implements the multiple-instance paradigm, where the same function is applied to multiple, disjoint data sets concurrently.
The Sun S3L user interface includes a communicator setup routine that allows Sun S3L functions to be used in multithreaded applications. This routine causes Sun S3L to establish an independent Sun MPI communicator and thread-safe data for each thread from which the routine is called.
Sun S3L routines implement the Sun Performance Library for nodal operations. This is a collection of libraries for dense linear algebra and Fourier transforms based on the standard libraries BLAS, LINPACK, LAPACK, FFTPACK, and VFFTPACK. Besides providing appropriate nodal support to Sun S3L, routines from the Sun Performance Library can be called independently from any user codes running locally on a Sun Ultra HPC Server node.
The Sun Performance Library is available to Sun S3L users as part of WorkShop Compilers Fortran or Performance WorkShop Fortran, v4.2 and v5.0.
Sun S3L routines operate on objects of various data types. However, this information is encoded in the array handle and is decoded at run time, allowing appropriate branching to occur during execution. Consequently, there is no need for separate routines with different names to implement the different data types; a single routine suffices for all types.
An extensive set of online examples illustrate correct use of all Sun S3L functions. These examples can be used as templates in developing actual code. Separate examples are provided to demonstrate C and Fortran interfaces.
Sun S3L consists of a set of core library functions--that is, subroutines that perform the linear algebra, Fourier transform, and other scientific computations--plus a set of auxiliary utilities, referred to as the toolkit functions.
The toolkit functions are introduced in "Sun S3L Toolkit Functions", with detailed descriptions provided in Chapter 7, Sun S3L Toolkit Routines. The core library functions are introduced in "Core Scientific Library Routines", with detailed descriptions in Chapter 8, Sun S3L Core Library Functions. They are also described in their online man pages.
Many of the Sun S3L core routines support the corresponding ScaLAPACK application programming interfaces (APIs). Table 1-1 lists the ScaLAPACK APIs that are supported.
Table 1-1 Supported ScaLAPACK APIs
Category |
Routine |
---|---|
PBLAS 1,2,3 |
p{s,d}dot, p{c,z}dotu, p{s,d}nrm2, p{sc,dz}nrm2, p{s,d}ger, p{c,z}geru, p{s,d,c,z}gemv, p{s,d,c,z}gemm |
LU factor, solve, inverse |
p{s,d,c,z}getrf,p{c,d,c,z}getrs,p{c,d,c,z}getri |
Tridiagonal solvers |
p{s,d,c,z}dttrf, p{s,d,c,z}dttrs |
Banded solvers
|
p(s,d,c,z)gbsv, p(s,d,c,z)gbtrf, p(s,d,c,z)gbtrs |
Symmetric eigensolver |
p{s,d}syevx, p{c,z}heevx |
Singular Value Decomposition |
p{s,d,c,z}geqrf |
Least Squares Solver |
p{s,d,c,z}gels |
Sun S3L includes an extensive set of functions that enable Sun MPI programmers to perform a variety of auxiliary tasks, such as:
Initializing and exiting from the S3L environment.
Creating and destroying S3L array handles for defining parallel arrays.
Creating and destroying S3L process grid handles for defining process grids.
Performing operations on array elements.
Extract information about parallel arrays and array subgrids.
Reading a file into all or part of an S3L parallel array.
Writing all or part of an S3L parallel array into a file.
Printing all or part of an S3L parallel array to standard output.
Converting ScaLAPACK descriptors into S3L array handles and S3L array handles into ScaLAPACK descriptors.
Creating Sun MPI communicators to allow thread-safe operation of S3L functions.
Controlling the S3L safety mechanism.
The Sun S3L core routines consist of:
Inner product - Compute the global inner product over all axes of two source parallel arrays. The inner product is added to the destination. A routine that takes the conjugate of the second operand is provided for complex data.
Outer product - Compute one or more instances of an outer product of two vectors. The result is added to the destination. For complex data, a routine that takes the conjugate of the second operand is provided.
Matrix-vector multiplication - Compute one or more instances of a matrix-vector product. The result is added to the destination, or is added to a second parallel array. For complex data, a routine that takes the conjugate of the matrix is provided.
LU-factorization and LU-solve routines
LU-factorization routine - For each m x n coefficient matrix A of a, computes LU factorization using partial pivoting with row interchanges.
LU-solve routine - Uses the L and U factors produced by the LU-factorization routine to produce solutions to the system AX=B. B may represent one or more right-hand sides for each instance of the systems of equations.
Setup and deallocation of FFT handles - Initialize and deallocate FFT handles for both complex and real data types. Separate routines are used for the two data types.
Simple complex-to-complex, mixed-radix, forward and inverse FFT routines - Performs forward or inverse Fast Fourier Transform of a parallel array of type complex or double complex. Supports both power-of-two and arbitrary radix parameters.
Detailed complex-to-complex FFT routine - Allows independent specification along each data axis of the transform direction in a complex-to-complex FFT. Can improve performance over the simple FFT in some cases.
Structured solver
Tridiagonal solver - Solves collections of tridiagonal linear systems of equations using Gaussian elimination with pivoting.
Banded solver - Solves collections of banded linear systems of equations using Gaussian elimination with pivoting.
Dense symmetric eigenvalue solver - Computes selected eigenvalues and, optionally, engenvectors of hermitian matrices.
Dense Singular Value Decomposition (SVD) - Computes the singular value decomposition of an M x N matrix and, optionally, the left and right singular vectors.
Sparse routines
Declare array handle for a sparse matrix.
Read data from a file into a distributed matrix, with support for both COO and CSR sparse storage formats.
Compute the product of a sparse matrix with a dense vector.
Iterative solver - Solves a general sparse linear system of equations using iterative methods, with or without preconditioning.
Convolution/Deconvolution
Convolve - Computes 1D or 2D convolution of one array with another.
Deconvolve - Deconvolves an array into a vector.
Iterative eigensolver - Computes selected eigenpairs of dense or sparse matrices, with optional specification of eigenpair properties.
Autocorrelation - Computes 1D or 2D autocorrelation of a signal.
Sort and grade - Sort and grade arrays.
Parallel random number generators
Fibonacci RNG setup and deallocation - Initializes and deallocates the state table of a lagged Fibonacci random number generator (LFG).
LCG RNG setup - Defines the parameters used in the Sun S3L linear congruential random number generator (LCG).
Zero array elements - Replaces all elements in an array with zero.