| Sun Performance Library User's Guide |
SPARC Optimization and Parallel Processing
This chapter describes how to use compiler and linking options to optimize applications for:
- Specific SPARC instruction set architectures
- 64-bit code
- Parallel processing
Using Sun Performance Library on SPARC Platforms
The Sun Performance Library was compiled using the
f95compiler provided with this release. The Sun Performance Library routines were compiled using-dalignand-xarchset tov8,v8plusa,orv9a.For each
-xarchoption used to compile the libraries, there is a library compiled with-xparalleland a library compiled without-xparallel. When linking the program, use-dalign,-xlic_lib=sunperf, and the same-xarchoption that was used when compiling. If-daligncannot be used in the program, supply a trap 6 handler as described in Getting Started With Sun Performance Library. If compiling with a value of-xarchthat is not one of[v8|v8plusa|v9a], the compiler driver will select the closest match.Sun Performance Library is linked into an application with the
-xlic_libswitch rather than the-lswitch that is used to link in other libraries, as shown below:
my_system%f95 -dalign my_file.f -xlic_lib=sunperfThe
-xlic_libswitch gives the same effect as if-lwas used to specify the Sun Performance Library and added-lswitches for all of the supporting libraries that Sun Performance Library requires.Compiling for SPARC Platforms
Applications using Sun Performance Library can be optimized for specific SPARC instruction set architectures and for 64-bit code. The optimization for each architecture is targeted at one implementation of that architecture and includes optimizations for other architectures when it does not degrade the performance of the primary target.
Compile with the most appropriate
-xarch=option for best performance. At link time, use the same-xarch=option that was used at compile time to select the version of the Sun Performance Library optimized for a specific SPARC instruction set architecture.
Note Using SPARC-specific optimization options increases application performance on the selected instruction set architecture, but limits code portability. When using these optimization options, the resulting code can be run only on systems using the specific SPARC chip from Sun Microsystems and, in some cases, a specific Solaris operating environment (32- or 64-bit Solaris 7 or Solaris 8).
The SunOSTM command
isalist(1) can be used to display a list of the native instruction sets executable on a particular platform. The names output byisalistare space-separated and are ordered in the sense of best performance.For a detailed description of the different
-xarchoptions, refer to the Fortran User's Guide or C User's Guide.To compile for 32-bit addressing in a 32-bit enabled Solaris operating environment:
- UltraSPARC ITM or UltraSPARC IITM systems - use
-xarch=v8plusor-xarch=v8plusa.- UltraSPARC IIITM systems - use
-xarch=v8plusor-xarch=v8plusb.To compile for 64-bit addressing in a 64-bit enabled Solaris operating environment:
- UltraSPARC I or UltraSPARC - II systems-, use
-xarch=v9or-xarch=v9a.- UltraSPARC III systems - use
-xarch=v9or-xarch=v9b.Compiling Code for 64-Bit UltraSPARC
To compile 64
-bit code on UltraSPARC, use-xarch=v9[a|b]and convert all integer arguments to 64-bit arguments. 64-bit routines require the use of 64-bit integers.Sun Performance Library provides 32-bit and 64-bit interfaces. To use the 64-bit interfaces:
- Modify the Sun Performance Library routine name: For C, FORTRAN 77, and Fortran 95 code (without the
USE SUNPERFstatement),_64must be appended to the names of Sun Performance Library routines (for example,dgbcon_64orCAXPY_64). Forf95code with theUSE SUNPERFstatement, do not append_64to the Sun Performance Library routine names. The compiler will infer the correct interface from the presence or absence ofINTEGER*8arguments.- Promote integers to 64 bits. Double precision variables and the real and imaginary parts of double complex variables are already 64 bits. Only the size of the integers is affected.
To control promotion of integer arguments, do one of the following:
- To promote all integers from 32 bits to 64 bits, compile with
-xtypemap=integer:64.- When using Fortran, to avoid promoting all integers, change
INTEGERorINTEGER*4declarations toINTEGER*8.When passing constants in Fortran 95 code that have not been compiled with
-xtypemap, append_8to literal constants to effect the promotion. For example, when using Fortran 95, changeCALL DSCAL(20,5.26D0,X,1)toCALL DSCAL(20_8,5.26D0,X,1_8). This example assumesUSE SUNPERFis included in the code.The following example shows calling
CAXPYfrom FORTRAN 77 or Fortran 95 using 32-bit arguments:
SUBROUTINE CAXPY (N, ALPHA, X, INCY, Y, INCY)COMPLEX ALPHAINTEGER INCX, INCY, NCOMPLEX X( * ), Y( * )The following example shows calling
CAXPYfrom FORTRAN 77 or Fortran 95 (without theUSE SUNPERFstatement) using 64-bit arguments:
SUBROUTINE CAXPY_64 (N, ALPHA, X, INCY, Y, INCY)COMPLEX ALPHAINTEGER*8 INCX, INCY, NCOMPLEX X( * ), Y( * )The following example shows calling
CAXPYfrom Fortran 95 (with theUSE SUNPERFstatement) using 64-bit arguments:
SUBROUTINE CAXPY (N, ALPHA, X, INCY, Y, INCY)COMPLEX ALPHAINTEGER*8 INCX, INCY, NCOMPLEX X( * ), Y( * )In C routines, the size of
longis 32 bits when compiling for V8 or V8plus and 64 bits when compiling for V9. The following example shows calling thedgbconroutine using 32-bit arguments.
void dgbcon(char norm, int n, int nsub, int nsuper, double *da,int lda, int *ipivot, double danorm, double drcond,int *info)The following example shows calling the
dgbconroutine using 64-bit arguments.
void dgbcon_64 (char norm, long n, long nsub, long nsuper,double *da, long lda, long *ipivot, double danorm,double *drcond, long *info)Optimizing for Parallel Processing
Note The Fortran compiler parallelization features require a Sun WorkShop HPC license.
Sun Performance Library can be used with the shared or dedicated modes of parallelization, that are user selectable at link time. Specifying the parallelization mode improves application performance by using the parallelization enhancements made to Sun Performance Library routines.
The shared multiprocessor model of parallelism has the following features:
- Delivers peak performance to applications that do not use compiler parallelization and that run on a platform shared with other applications.
- Parallelization is implemented with threads library synchronization primitives.
The dedicated multiprocessor model of parallelism has the following features:
- Delivers peak performance to applications using automatic compiler parallelization and running on an MP platform dedicated to a single processor-intensive application
- Parallelization is implemented with spin locks.
On a dedicated system, the dedicated model can be faster than the shared model due to lower synchronization overhead. On a system running many different tasks, the shared model can make better use of available resources.
Specifying the Parallelization Mode
To specify the parallelization mode:
- Shared model - Use
-mton the link line without one of the compiler parallelization options.- Dedicated model - Use one of the compiler parallelization options
[-xparallel|-xexplicitpar|-xautopar]on the compile and link lines.- Single processor - Do not specify any of the compiler parallelization options or
-mton the link line.
Note Using the shared model with one of the compiler parallelization options,-xparallel,-xexplicitpar,or-xautopar,produces unpredictable behavior.
If compiling with one of the compiler parallelization options:
- Use the same parallelization option on the linking command.
- To use multiple processors, add
-mtto the link line, and then specify the number of processors at runtime with thePARALLELenvironment variable.For example, to use 24 processors, type the commands shown below:
my_system%f95 -dalign -mt my_app.f -xlic_lib=sunperfmy_system%setenv PARALLEL 24my_system%./a.out
Note Parallel processing options require using either the-daligncommand-line option or establishing a trap 6 handler, as described in Enabling Trap 6. When using C, do not use-misalign.
Starting Threads
When Sun Performance Library starts threads in shared mode, it uses a stack size that it determines as follows:
- Checks the value of the
STACKSIZEenvironment variable and interpret the units as kbytes (1024 bytes).- Computes the maximum stack size required by Sun Performance Library.
- Uses the largest of the values determined in steps 1 and 2 for the size of the stack in the created thread.
When Sun Performance Library starts threads in dedicated mode, use the
STACKSIZEenvironment variable to specify a stack size of at least 4 MB:
setenv STACKSIZE 4000Parallel Processing Examples
The following sections demonstrate using the
PARALLELenvironment variable and the compile and linking options for creating code that supports using:
- A single processor
- Multiple processors in shared mode
- Multiple processors in dedicated mode
Using a Single
Processor1. Call one or more of the routines.2. SetPARALLELequal to 1.3. Link with-xlic_lib=sunperfspecified at the end of the command line.Do not compile or link with
-parallel,-explicitpar, or-autopar.For example, compile and link with
libsunperf.so(default):
orcc -dalign -xarch=... any.c -xlic_lib=sunperforf77 -dalign -xarch=... any.f -xlic_lib=sunperff95 -dalign -xarch=... any.f95 -xlic_lib=sunperfFor example: Compile and link with
libsunperf.astatically:
Using Multiple Processors in Shared Mode
To use multiple processors in shared mode:
1. Call one or more of the routines.2. SetPARALLELto a number greater than 1.3. Compile and link with-mt.4. Link with-xlic_lib=sunperfspecified at the end of the command line.Do not compile or link with
-parallel,-explicitpar, or-autopar.For example, compile and link with
libsunperf.so(default):
orcc -dalign -xarch=... any.c -xlic_lib=sunperf -mtorf77 -dalign -xarch=... any.f -xlic_lib=sunperf -mtf95 -dalign -xarch=... any.f95 -xlic_lib=sunperf -mtFor example: Compile and link with
libsunperf.astatically:
Using Multiple Processors in Dedicated Mode (With Parallelization Options)
To use multiple processors in dedicated mode:
1. Call one or more of the routines.2. SetPARALLELto the number of available processors.3. Link with-xlic_lib=sunperfspecified at the end of the command line.Compile and link with
-parallel,-explicitpar, or-autopar.For example, compile and link with
libsunperf_mt.so(default):
orcc -dalign -xarch=... -xparallel any.c -xlic_lib=sunperforf77 -dalign -xarch=... -parallel any.f -xlic_lib=sunperff95 -dalign -xarch=... -parallel any.f95 -xlic_lib=sunperfFor example, compile and link with
libsunperf_mt.astatically:
|
Sun Microsystems, Inc. Copyright information. All rights reserved. Feedback |
Library | Contents | Previous | Next | Index |