C H A P T E R 3 |
Compiling for OpenMP |
This chapter describes how to compile programs that utilize the OpenMP API.
To run a parallelized program in a multithreaded environment, you must set the OMP_NUM_THREADS environment variable prior to program execution. This tells the runtime system the maximum number of threads the program can create. The default is 1. In general, set OMP_NUM_THREADS to the available number of processors on the target platform.
The compiler readme files contain information about limitations and known deficiencies regarding their OpenMP implementation. Readme files are viewable directly by invoking the compiler with the -xhelp=readme flag, or by pointing an HTML browser to the documentation index for the installed software at
file:/opt/SUNWspro/docs/index.html
To enable explicit parallelization with OpenMP directives, compile your program with the cc, CC, or f95 option flag -xopenmp. This flag can take an optional keyword argument. (The f95 compiler accepts both -xopenmp and -openmp as synonyms.)
The -xopenmp flag accepts the following keyword sub-options.
If you do not specify -xopenmp on the command line, the compiler assumes -xopenmp=none (disabling recognition of OpenMP pragmas).
If you specify -xopenmp but without a keyword sub-option, the compiler assumes -xopenmp=parallel.
Do not specify -xopenmp together with -xparallel or -xexplicitpar on the command line.
Specifying -xopenmp= with parallel , noopt , or stubs will define the _OPENMP preprocessor token to be YYYYMM (specifically 200203L for C/C++ and 200011 for Fortran 95).
When debugging OpenMP programs with dbx, compile with -xopenmp=noopt -g
The default optimization level for -xopenmp might change in future releases. Warning messages can be avoided by specifying an appropriate optimization level explicitly.
With Fortran 95, -xopenmp , -xopenmp=parallel, -xopenmp=noopt will add -stackvar automatically.
You can obtain a static, interprocedural validation of a Fortran 95 program's OpenMP directives by using the f95 compiler's global program checking feature. Enable OpenMP checking by compiling with the -XlistMP flag. (Diagnostic messages from -XlistMP appear in a separate file created with the name of the source file and a .lst extension). The compiler will diagnose the following violations and parallelization inhibitors:
For example, compiling a source file ord.f with -XlistMP produces a diagnostic file ord.lst:
In this example, the ORDERED directive in subroutine WORK receives a diagnostic that refers to the second DO directive because it lacks an ORDERED clause.
The OpenMP specifications define four environment variables that control the execution of OpenMP programs. These are summarized in the following table. Additional multiprocessing environment variables affect execution of OpenMP programs and are not part of the OpenMP specifications. These are summarized in the following table.
The executing program maintains a main memory stack for the initial thread executing the program, as well as distinct stacks for each helper thread. Stacks are temporary memory address spaces used to hold arguments and automatic variables over subprogram or function references.
In general, the default main stack size is about 8 megabytes. Compiling Fortran programs with the f95 -stackvar option forces the allocation of local variables and arrays on the stack as if they were automatic variables. Use of -stackvar with OpenMP programs is implied with explicitly parallelized programs because it improves the optimizer's ability to parallelize calls in loops. (See the Fortran User's Guide for a discussion of the -stackvar flag.) However, this may lead to stack overflow if not enough memory is allocated for the stack.
Use the limit C-shell command, or the ulimit ksh/sh command, to display or set the size of the main stack.
Each helper thread of a multithreaded program has its own thread stack. This stack mimics the initial (or main) thread stack but is unique to the thread. The thread's PRIVATE arrays and variables (local to the thread) are allocated on the thread stack. The default size is 4 megabytes on 32-bit systems and 8 megabytes on 64-bit systems. The size of the helper thread stack is set with the STACKSIZE environment variable.
demo% setenv STACKSIZE 16384 <-Set thread stack size to 16 Mb (C shell) demo% STACKSIZE=16384 <-Same, using Bourne/Korn shell demo% export STACKSIZE |
Finding the best stack size might have to be determined by trial and error. If the stack size is too small for a thread to run it may cause silent data corruption in neighboring threads, or segmentation faults. If you are unsure about stack overflows, compile your Fortran or C programs with the -xcheck=stkovf flag to force a segmentation fault on stack overflow. This stops the program before any data corruption can occur.
Copyright © 2003, Sun Microsystems, Inc. All rights reserved.