Optimization

C H A P T E R 3

Optimization

This chapter describes how to use compiler and linking options to optimize applications for:

Specific instruction-set architectures

32-bit and 64-bit enabled operating environments

TABLE 3-1 shows a comparison of the 32-bit and 64-bit operating environments. These items are described in greater detail in the following sections.

TABLE 3-1 Comparison of 32-bit and 64-bit Operating Environments
	32-bit (ILP 32)	64-bit (LP64)
`-xarch` on SPARC platforms	`v8`, `sparcvis, sparcvis2, sparcfmaf`	`sparcvis, sparcvis2, sparcfmaf`
-xarch on x86 platforms	`generic, sse2`	`sse2`
addressing	`-m32`	`-m64`
Fortran Integers	`INTEGER`, `INTEGER*4`	`INTEGER*8`
C Integers	`int`	`long`
Floating-point	`S/D/C/Z`	`S/D/C/Z`
API	Names of routines	Names of routines with `_64` suffix

3.1 Using The Sun Performance Library

The Sun Performance Library was compiled using the f95 compiler provided with this release. The Sun Performance Library routines were compiled using -dalign, -xparallel.

3.1.1 Fortran and C

When linking the program, use -dalign -xlic_lib=sunperf and the same command line options that were used when compiling.

Sun Performance Library is linked into an application with the -xlic_lib switch rather than the -l switch that is used to link in other libraries, as shown here.

 my_system% f95 -dalign my_file.f -xlic_lib=sunperf

3.1.2 C++

When linking your program, use -dalign -library=sunperf, and the same command line options that were used when compiling. Sun Performance Library is linked into an application with the -library switch, as shown here, rather than the -l switch.

 my_system% CC -dalign my_file.cpp -library=sunperf

If -dalign cannot be used in the program, supply a trap 6 handler as described in Getting Started With Sun Performance Library

3.2 Compiling

Compile with the most appropriate -xarch= option for best performance. At link time, use the same -xarch= option that was used at compile time to select the version of the Sun Performance Library optimized for a specific instruction-set architecture.

Note - Using instruction-set specific optimization options improves application performance on the selected instruction set architecture, but limits code portability.

For a detailed description of the different -xarch options, refer to the Fortran User’s Guide or the C User’s Guide.

The following lists the -xarch values for SPARC instruction-set architectures:

SPARC64VI platforms: Use -xarch=sparcfmaf

UltraSPARC III, IV or IV+ platforms. Use -xarch=sparcvis2.

UltraSPARC I or UltraSPARC II platforms. Use -xarch=sparcvis

The following are the -xarch values for x86 instruction-set architectures:

AMD Opteron platforms. Use -xarch=sse2

Intel Core-Duo, AMD Barcelona platforms. Use -xarch=sse3

Generic x86systems. Use -xarch=generic

3.2.1 Compiling Code for a 64-Bit Enabled Operating Environments

To compile code for a 64-bit enabled operating environment, use -m64 and convert all integer arguments to 64-bit arguments. 64-bit routines require the use of 64-bit integers.

Sun Performance Library provides 32-bit and 64-bit interfaces. To use the 64-bit interfaces:

Modify the Sun Performance Library routine name. For C and Fortran 95 code, append _64 to the names of Sun Performance Library routines (for example, rfftf_64 or CFFTB_64). For Fortran 95 code with the USE SUNPERF statement, the _64 suffix is not strictly required for specific interfaces, such as DGEMM. The _64 suffix is still required for the generic interfaces, such as GEMM.

Promote integers to 64 bits. Double precision variables and the real and imaginary parts of double complex variables are already 64 bits. Only the integers are promoted to 64 bits.

3.2.2 64-Bit Integer Arguments

These additional 64-bit-integer interfaces are available only when linking with -m64. Codes compiled for 32-bit operating environments (-m32) cannot call the 64-bit-integer interfaces.

To call the 64-bit-integer interfaces directly, append the suffix _64 to the standard library name. For example, use daxpy_64() in place of daxpy().

However, if calling the 64-bit integer interfaces indirectly, do not append _64 to the name of the Sun Performance Library routine. Calls to the Sun Performance Library routine will access a 32-bit wrapper that promotes the 32-bit integers to 64-bit integers, calls the 64-bit routine, and then demotes the 64-bit integers to 32-bit integers.

For best performance, call the routine directly by appending _64 to the routine name.

For C programs, use long instead of int arguments. The following code example shows calling the 64-bit integer interfaces directly.

#include <sunperf.h>
long n, incx, incy;
double alpha, *x, *y;
daxpy_64(n, alpha, x, incx, y, incy);

The following code example shows calling the 64-bit integer interfaces indirectly.

#include <sunperf.h>
int  n, incx, incy;
double alpha, *x, *y;
daxpy   (n, alpha, x, incx, y, incy);

For Fortran programs, use 64-bit integers for all integer arguments. The following methods can be used to convert integer arguments to 64-bits:

To promote all default integers (integers declared without explicit byte sizes) and literal integer constants from 32 bits to 64 bits, compile with -xtypemap=integer:64.

To promote specific integer declarations, change INTEGER or INTEGER*4 to INTEGER*8.

To promote integer literal constants, append _8 to the constant.

Consider the following code example.

INTEGER*8 N
REAL*8 ALPHA, X(N), Y(N)
 
! _64 SUFFIX: N AND 1_8 ARE 64-BIT INTEGERS
CALL DAXPY_64(N,ALPHA,X,1_8,Y,1_8)

INTEGER*8 arguments cannot be used in a 32-bit environment. Routines in the 32-bit libraries, v8, v8plusa, v8plusb, cannot be called with 64-bit arguments. However, the 64-bit routines can be called with 32-bit arguments.

When passing constants in Fortran 95 code that have not been compiled with -xtypemap, append _8 to literal constants to effect the promotion. For example, when using Fortran 95, change CALL DSCAL(20,5.26D0,X,1) to CALL DSCAL(20_8,5.26D0,X,1_8). This example assumes USE SUNPERF is included in the code, because the _64 has not been appended to the routine name.

The following code example shows calling CAXPY from Fortran 95 using 32-bit arguments.

       PROGRAM TEST
       COMPLEX ALPHA
       INTEGER,PARAMETER :: INCX=1, INCY=1, N=10
       COMPLEX X(N), Y(N)
 
       CALL CAXPY(N, ALPHA, X, INCX, Y, INCY)

The following code example shows calling CAXPY from Fortran 95 (without the USE SUNPERF statement) using 64-bit arguments.

       PROGRAM TEST
       COMPLEX   ALPHA
       INTEGER*8, PARAMETER :: INCX=1, INCY=1, N=10
       COMPLEX   X(N), Y(N)
 
       CALL CAXPY_64(N, ALPHA, X, INCX, Y, INCY)

When using 64-bit arguments, the _64 must be appended to the routine name if the USE SUNPERF statement is not used.

The following Fortran 95 code example shows calling CAXPY using 64-bit arguments.

       PROGRAM TEST
       USE SUNPERF
       .
       .
       .
       COMPLEX   ALPHA
       INTEGER*8, PARAMETER :: INCX=1, INCY=1, N=10
       COMPLEX   X(N), Y(N)
 
       CALL CAXPY(N, ALPHA, X, INCX, Y, INCY)

In C routines, the size of long is 32 bits when compiling for V8 or V8plus and 64 bits when compiling for V9. The following code example shows calling the dgbcon routine using 32-bit arguments.

void dgbcon(char norm, int n, int nsub, int nsuper, double *da,
            int lda, int *ipivot, double danorm, double drcond, 
            int *info)

The following code example shows calling the dgbcon routine using 64-bit arguments.

void dgbcon_64 (char norm, long n, long nsub, long nsuper,
                  double *da, long lda, long *ipivot, double danorm,
                double *drcond, long *info)