Sun Performance Library User's Guide HomeContentsPreviousNextIndex


Chapter 3

SPARC Optimization and Parallel Processing

This chapter describes how to use compiler and linking options to optimize applications for:

Using Sun Performance Library on SPARC Platforms

The Sun Performance Library was compiled using the f95 compiler provided with this release. The Sun Performance Library routines were compiled using -dalign and -xarch set to v8, v8plusa, or v9a.

For each -xarch option used to compile the libraries, there is a library compiled with -xparallel and a library compiled without -xparallel. When linking the program, use -dalign, -xlic_lib=sunperf, and the same -xarch option that was used when compiling. If -dalign cannot be used in the program, supply a trap 6 handler as described in Getting Started With Sun Performance Library. If compiling with a value of -xarch that is not one of [v8|v8plusa|v9a], the compiler driver will select the closest match.

Sun Performance Library is linked into an application with the -xlic_lib switch rather than the -l switch that is used to link in other libraries, as shown below:

 my_system% f95 -dalign my_file.f -xlic_lib=sunperf

The -xlic_lib switch gives the same effect as if -l was used to specify the Sun Performance Library and added -l switches for all of the supporting libraries that Sun Performance Library requires.

Compiling for SPARC Platforms

Applications using Sun Performance Library can be optimized for specific SPARC instruction set architectures and for 64-bit code. The optimization for each architecture is targeted at one implementation of that architecture and includes optimizations for other architectures when it does not degrade the performance of the primary target.

Compile with the most appropriate -xarch= option for best performance. At link time, use the same -xarch= option that was used at compile time to select the version of the Sun Performance Library optimized for a specific SPARC instruction set architecture.


Note – Using SPARC-specific optimization options increases application performance on the selected instruction set architecture, but limits code portability. When using these optimization options, the resulting code can be run only on systems using the specific SPARC chip from Sun Microsystems and, in some cases, a specific Solaris operating environment (32- or 64-bit Solaris 7 or Solaris 8).

The SunOSTM command isalist(1) can be used to display a list of the native instruction sets executable on a particular platform. The names output by isalist are space-separated and are ordered in the sense of best performance.

For a detailed description of the different -xarch options, refer to the Fortran User's Guide or C User's Guide.

To compile for 32-bit addressing in a 32-bit enabled Solaris operating environment:

To compile for 64-bit addressing in a 64-bit enabled Solaris operating environment:

Compiling Code for 64-Bit UltraSPARC

To compile 64-bit code on UltraSPARC, use -xarch=v9[a|b] and convert all integer arguments to 64-bit arguments. 64-bit routines require the use of 64-bit integers.

Sun Performance Library provides 32-bit and 64-bit interfaces. To use the 64-bit interfaces:

To control promotion of integer arguments, do one of the following:

When passing constants in Fortran 95 code that have not been compiled with -xtypemap, append _8 to literal constants to effect the promotion. For example, when using Fortran 95, change CALL DSCAL(20,5.26D0,X,1) to CALL DSCAL(20_8,5.26D0,X,1_8). This example assumes USE SUNPERF is included in the code.

The following example shows calling CAXPY from FORTRAN 77 or Fortran 95 using 32-bit arguments:

     SUBROUTINE CAXPY (N, ALPHA, X, INCY, Y, INCY)
     COMPLEX ALPHA
     INTEGER INCX, INCY, N
     COMPLEX X( * ), Y( * )

The following example shows calling CAXPY from FORTRAN 77 or Fortran 95 (without the USE SUNPERF statement) using 64-bit arguments:

     SUBROUTINE CAXPY_64 (N, ALPHA, X, INCY, Y, INCY)
     COMPLEX   ALPHA
     INTEGER*8 INCX, INCY, N
     COMPLEX   X( * ), Y( * )

The following example shows calling CAXPY from Fortran 95 (with the USE SUNPERF statement) using 64-bit arguments:

     SUBROUTINE CAXPY (N, ALPHA, X, INCY, Y, INCY)
     COMPLEX   ALPHA
     INTEGER*8 INCX, INCY, N
     COMPLEX   X( * ), Y( * )

In C routines, the size of long is 32 bits when compiling for V8 or V8plus and 64 bits when compiling for V9. The following example shows calling the dgbcon routine using 32-bit arguments.

void dgbcon(char norm, int n, int nsub, int nsuper, double *da,
            int lda, int *ipivot, double danorm, double drcond, 
            int *info)

The following example shows calling the dgbcon routine using 64-bit arguments.

void dgbcon_64 (char norm, long n, long nsub, long nsuper,
                  double *da, long lda, long *ipivot, double danorm,
                double *drcond, long *info)

Optimizing for Parallel Processing


Note – The Fortran compiler parallelization features require a Sun WorkShop HPC license.

Sun Performance Library can be used with the shared or dedicated modes of parallelization, that are user selectable at link time. Specifying the parallelization mode improves application performance by using the parallelization enhancements made to Sun Performance Library routines.

The shared multiprocessor model of parallelism has the following features:

The dedicated multiprocessor model of parallelism has the following features:

On a dedicated system, the dedicated model can be faster than the shared model due to lower synchronization overhead. On a system running many different tasks, the shared model can make better use of available resources.

Specifying the Parallelization Mode

To specify the parallelization mode:

If compiling with one of the compiler parallelization options:

For example, to use 24 processors, type the commands shown below:

 my_system% f95 -dalign -mt my_app.f -xlic_lib=sunperf
 my_system% setenv PARALLEL 24
 my_system% ./a.out


Note – Parallel processing options require using either the -dalign command-line option or establishing a trap 6 handler, as described in Enabling Trap 6. When using C, do not use -misalign.

Starting Threads

When Sun Performance Library starts threads in shared mode, it uses a stack size that it determines as follows:

  1. Checks the value of the STACKSIZE environment variable and interpret the units as kbytes (1024 bytes).

  2. Computes the maximum stack size required by Sun Performance Library.

  3. Uses the largest of the values determined in steps 1 and 2 for the size of the stack in the created thread.

When Sun Performance Library starts threads in dedicated mode, use the STACKSIZE environment variable to specify a stack size of at least 4 MB:

setenv STACKSIZE 4000

Parallel Processing Examples

The following sections demonstrate using the PARALLEL environment variable and the compile and linking options for creating code that supports using:

Using a Single Processor

To use a single processor:

1. Call one or more of the routines.

2. Set PARALLEL equal to 1.

3. Link with -xlic_lib=sunperf specified at the end of the command line.

Do not compile or link with -parallel, -explicitpar, or -autopar.

For example, compile and link with libsunperf.so (default):

cc -dalign -xarch=... any.c -xlic_lib=sunperf
or
f77 -dalign -xarch=... any.f -xlic_lib=sunperf
or
f95 -dalign -xarch=... any.f95 -xlic_lib=sunperf

For example: Compile and link with libsunperf.a statically:

cc -dalign -xarch=... any.c -Bstatic -xlic_lib=sunperf -Bdynamic
or
f77 -dalign -xarch=... any.f -Bstatic -xlic_lib=sunperf -Bdynamic
or
f95 -dalign -xarch=... any.f95 -Bstatic -xlic_lib=sunperf -Bdynamic

Using Multiple Processors in Shared Mode

To use multiple processors in shared mode:

1. Call one or more of the routines.

2. Set PARALLEL to a number greater than 1.

3. Compile and link with -mt.

4. Link with -xlic_lib=sunperf specified at the end of the command line.

Do not compile or link with -parallel, -explicitpar, or -autopar.

For example, compile and link with libsunperf.so (default):

cc -dalign -xarch=... any.c -xlic_lib=sunperf -mt
or
f77 -dalign -xarch=... any.f -xlic_lib=sunperf -mt
or
f95 -dalign -xarch=... any.f95 -xlic_lib=sunperf -mt

For example: Compile and link with libsunperf.a statically:

cc -dalign -xarch=... any.c -Bstatic -xlic_lib=sunperf -Bdynamic -mt
or
f77 -dalign -xarch=... any.f -Bstatic -xlic_lib=sunperf -Bdynamic -mt
or
f95 -dalign -xarch=... any.f95 -Bstatic -xlic_lib=sunperf -Bdynamic -mt

Using Multiple Processors in Dedicated Mode (With Parallelization Options)

To use multiple processors in dedicated mode:

1. Call one or more of the routines.

2. Set PARALLEL to the number of available processors.

3. Link with -xlic_lib=sunperf specified at the end of the command line.

Compile and link with -parallel, -explicitpar, or -autopar.

For example, compile and link with libsunperf_mt.so (default):

cc -dalign -xarch=... -xparallel any.c -xlic_lib=sunperf
or
f77 -dalign -xarch=... -parallel any.f -xlic_lib=sunperf
or
f95 -dalign -xarch=... -parallel any.f95 -xlic_lib=sunperf

For example, compile and link with libsunperf_mt.a statically:

cc -dalign -xarch=... -xparallel any.c  -Bstatic -xlic_lib=sunperf -Bdynamic
or
f77 -dalign -xarch=... -parallel any.f -Bstatic -xlic_lib=sunperf -Bdynamic
or
f95 -dalign -xarch=... -parallel any.f95 -Bstatic -xlic_lib=sunperf -Bdynamic


Sun Microsystems, Inc.
Copyright information. All rights reserved.
Feedback
Library   |   Contents   |   Previous   |   Next   |   Index