Sun Performance Library User's Guide |
SPARC Optimization and Parallel Processing
This chapter describes how to use compiler and linking options to optimize applications for:
- Specific SPARC instruction set architectures
- 64-bit code
- Parallel processing
Using Sun Performance Library on SPARC Platforms
The Sun Performance Library was compiled using the
f95
compiler provided with this release. The Sun Performance Library routines were compiled using-dalign
and-xarch
set tov8,
v8plusa,
orv9a.
For each
-xarch
option used to compile the libraries, there is a library compiled with-xparallel
and a library compiled without-xparallel
. When linking the program, use-dalign
,-xlic_lib=sunperf
, and the same-xarch
option that was used when compiling. If-dalign
cannot be used in the program, supply a trap 6 handler as described in Getting Started With Sun Performance Library. If compiling with a value of-xarch
that is not one of[v8|v8plusa|v9a]
, the compiler driver will select the closest match.Sun Performance Library is linked into an application with the
-xlic_lib
switch rather than the-l
switch that is used to link in other libraries, as shown below:
my_system%
f95 -dalign my_file.f -xlic_lib=sunperf
The
-xlic_lib
switch gives the same effect as if-l
was used to specify the Sun Performance Library and added-l
switches for all of the supporting libraries that Sun Performance Library requires.Compiling for SPARC Platforms
Applications using Sun Performance Library can be optimized for specific SPARC instruction set architectures and for 64-bit code. The optimization for each architecture is targeted at one implementation of that architecture and includes optimizations for other architectures when it does not degrade the performance of the primary target.
Compile with the most appropriate
-xarch=
option for best performance. At link time, use the same-xarch=
option that was used at compile time to select the version of the Sun Performance Library optimized for a specific SPARC instruction set architecture.
Note Using SPARC-specific optimization options increases application performance on the selected instruction set architecture, but limits code portability. When using these optimization options, the resulting code can be run only on systems using the specific SPARC chip from Sun Microsystems and, in some cases, a specific Solaris operating environment (32- or 64-bit Solaris 7 or Solaris 8).
The SunOSTM command
isalist
(1) can be used to display a list of the native instruction sets executable on a particular platform. The names output byisalist
are space-separated and are ordered in the sense of best performance.For a detailed description of the different
-xarch
options, refer to the Fortran User's Guide or C User's Guide.To compile for 32-bit addressing in a 32-bit enabled Solaris operating environment:
- UltraSPARC ITM or UltraSPARC IITM systems - use
-xarch=v8plus
or-xarch=v8plusa
.- UltraSPARC IIITM systems - use
-xarch=v8plus
or-xarch=v8plusb
.To compile for 64-bit addressing in a 64-bit enabled Solaris operating environment:
- UltraSPARC I or UltraSPARC - II systems-, use
-xarch=v9
or-xarch=v9a
.- UltraSPARC III systems - use
-xarch=v9
or-xarch=v9b
.Compiling Code for 64-Bit UltraSPARC
To compile 64
-
bit code on UltraSPARC, use-xarch=v9[a|b]
and convert all integer arguments to 64-
bit arguments. 64-bit routines require the use of 64-bit integers.Sun Performance Library provides 32-bit and 64-bit interfaces. To use the 64-bit interfaces:
- Modify the Sun Performance Library routine name: For C, FORTRAN 77, and Fortran 95 code (without the
USE SUNPERF
statement),_64
must be appended to the names of Sun Performance Library routines (for example,dgbcon_64
orCAXPY_64)
. Forf95
code with theUSE SUNPERF
statement, do not append_64
to the Sun Performance Library routine names. The compiler will infer the correct interface from the presence or absence ofINTEGER*8
arguments.- Promote integers to 64 bits. Double precision variables and the real and imaginary parts of double complex variables are already 64 bits. Only the size of the integers is affected.
To control promotion of integer arguments, do one of the following:
- To promote all integers from 32 bits to 64 bits, compile with
-xtypemap=integer:64
.- When using Fortran, to avoid promoting all integers, change
INTEGER
orINTEGER*4
declarations toINTEGER*8
.When passing constants in Fortran 95 code that have not been compiled with
-xtypemap
, append_8
to literal constants to effect the promotion. For example, when using Fortran 95, changeCALL DSCAL(20,5.26D0,X,1)
toCALL DSCAL(20_8,5.26D0,X,1_8)
. This example assumesUSE SUNPERF
is included in the code.The following example shows calling
CAXPY
from FORTRAN 77 or Fortran 95 using 32-bit arguments:
SUBROUTINE CAXPY (N, ALPHA, X, INCY, Y, INCY)COMPLEX ALPHAINTEGER INCX, INCY, NCOMPLEX X( * ), Y( * )The following example shows calling
CAXPY
from FORTRAN 77 or Fortran 95 (without theUSE SUNPERF
statement) using 64-bit arguments:
SUBROUTINE CAXPY_64 (N, ALPHA, X, INCY, Y, INCY)COMPLEX ALPHAINTEGER*8 INCX, INCY, NCOMPLEX X( * ), Y( * )The following example shows calling
CAXPY
from Fortran 95 (with theUSE SUNPERF
statement) using 64-bit arguments:
SUBROUTINE CAXPY (N, ALPHA, X, INCY, Y, INCY)COMPLEX ALPHAINTEGER*8 INCX, INCY, NCOMPLEX X( * ), Y( * )In C routines, the size of
long
is 32 bits when compiling for V8 or V8plus and 64 bits when compiling for V9. The following example shows calling thedgbcon
routine using 32-bit arguments.
void dgbcon(char norm, int n, int nsub, int nsuper, double *da,int lda, int *ipivot, double danorm, double drcond,int *info)The following example shows calling the
dgbcon
routine using 64-bit arguments.
void dgbcon_64 (char norm, long n, long nsub, long nsuper,double *da, long lda, long *ipivot, double danorm,double *drcond, long *info)Optimizing for Parallel Processing
Note The Fortran compiler parallelization features require a Sun WorkShop HPC license.
Sun Performance Library can be used with the shared or dedicated modes of parallelization, that are user selectable at link time. Specifying the parallelization mode improves application performance by using the parallelization enhancements made to Sun Performance Library routines.
The shared multiprocessor model of parallelism has the following features:
- Delivers peak performance to applications that do not use compiler parallelization and that run on a platform shared with other applications.
- Parallelization is implemented with threads library synchronization primitives.
The dedicated multiprocessor model of parallelism has the following features:
- Delivers peak performance to applications using automatic compiler parallelization and running on an MP platform dedicated to a single processor-intensive application
- Parallelization is implemented with spin locks.
On a dedicated system, the dedicated model can be faster than the shared model due to lower synchronization overhead. On a system running many different tasks, the shared model can make better use of available resources.
Specifying the Parallelization Mode
To specify the parallelization mode:
- Shared model - Use
-mt
on the link line without one of the compiler parallelization options.- Dedicated model - Use one of the compiler parallelization options
[-xparallel|-xexplicitpar|-xautopar]
on the compile and link lines.- Single processor - Do not specify any of the compiler parallelization options or
-mt
on the link line.
Note Using the shared model with one of the compiler parallelization options,-xparallel
,-xexplicitpar,
or-xautopar,
produces unpredictable behavior.
If compiling with one of the compiler parallelization options:
- Use the same parallelization option on the linking command.
- To use multiple processors, add
-mt
to the link line, and then specify the number of processors at runtime with thePARALLEL
environment variable.For example, to use 24 processors, type the commands shown below:
my_system%
f95 -dalign -mt my_app.f -xlic_lib=sunperf
my_system%
setenv PARALLEL 24
my_system%
./a.out
Note Parallel processing options require using either the-dalign
command-
line option or establishing a trap 6 handler, as described in Enabling Trap 6. When using C, do not use-misalign
.
Starting Threads
When Sun Performance Library starts threads in shared mode, it uses a stack size that it determines as follows:
- Checks the value of the
STACKSIZE
environment variable and interpret the units as kbytes (1024 bytes).- Computes the maximum stack size required by Sun Performance Library.
- Uses the largest of the values determined in steps 1 and 2 for the size of the stack in the created thread.
When Sun Performance Library starts threads in dedicated mode, use the
STACKSIZE
environment variable to specify a stack size of at least 4 MB:
setenv STACKSIZE 4000
Parallel Processing Examples
The following sections demonstrate using the
PARALLEL
environment variable and the compile and linking options for creating code that supports using:
- A single processor
- Multiple processors in shared mode
- Multiple processors in dedicated mode
Using a Single
Processor
1. Call one or more of the routines.2. SetPARALLEL
equal to 1.3. Link with-xlic_lib=sunperf
specified at the end of the command line.Do not compile or link with
-parallel
,-explicitpar
, or-autopar.
For example, compile and link with
libsunperf.so
(default):
orcc -dalign -xarch=... any.c -xlic_lib=sunperf
orf77 -dalign -xarch=... any.f -xlic_lib=sunperf
f95 -dalign -xarch=... any.f95 -xlic_lib=sunperf
For example: Compile and link with
libsunperf.a
statically:
Using Multiple Processors in Shared Mode
To use multiple processors in shared mode:
1. Call one or more of the routines.2. SetPARALLEL
to a number greater than 1.3. Compile and link with-
mt
.4. Link with-xlic_lib=sunperf
specified at the end of the command line.Do not compile or link with
-parallel
,-explicitpar
, or-autopar.
For example, compile and link with
libsunperf.so
(default):
orcc -dalign -xarch=... any.c -xlic_lib=sunperf -mt
orf77 -dalign -xarch=... any.f -xlic_lib=sunperf -mt
f95 -dalign -xarch=... any.f95 -xlic_lib=sunperf -mt
For example: Compile and link with
libsunperf.a
statically:
Using Multiple Processors in Dedicated Mode (With Parallelization Options)
To use multiple processors in dedicated mode:
1. Call one or more of the routines.2. SetPARALLEL
to the number of available processors.3. Link with-xlic_lib=sunperf
specified at the end of the command line.Compile and link with
-parallel
,-explicitpar
, or-autopar.
For example, compile and link with
libsunperf_mt.so
(default):
orcc -dalign -xarch=... -xparallel any.c -xlic_lib=sunperf
orf77 -dalign -xarch=... -parallel any.f -xlic_lib=sunperf
f95 -dalign -xarch=... -parallel any.f95 -xlic_lib=sunperf
For example, compile and link with
libsunperf_mt.a
statically:
Sun Microsystems, Inc. Copyright information. All rights reserved. Feedback |
Library | Contents | Previous | Next | Index |