Fortran Programming Guide

Cray-Style Parallelization Directives

Parallel directives have two forms: Sun style and Cray style. The f77 and f90 default is Sun style (-mp=sun). To use Cray-style directives, you must compile with -mp=cray.

Mixing program units compiled with both Sun and Cray directives can produce different results.

A major difference between Sun and Cray directives is that Cray style requires explicit scoping of every scalar and array in the loop as either SHARED or PRIVATE.

The following table shows Cray-style directive syntax.


!MIC$ DOALL
!MIC$&  SHARED( v1
, v2,   )
!MIC$&  PRIVATE( u1
, u2,   )
    ...optional qualifiers

Cray Directive Syntax

A parallel directive consists of one or more directive lines. A directive line is defined as follows:

With f90 -free free-format, leading blanks can appear before !MIC$.

Qualifiers (Cray Style)

For Cray-style directives, the PRIVATE qualifier is required. Each variable within the DO loop must be qualified as private or shared, and the DO loop index must always be private. The following table summarizes available Cray-style qualifiers.

Table 10-7 DOALL Qualifiers (Cray Style)

Qualifier 

Assertion 

SHARED( v1, v2, ... )

Share the variables v1, v2, ... between parallel processes. That is, they are accessible to all the tasks.

PRIVATE( x1, x2, ... )

Do not share the variables x1, x2, ... between parallel processes. That is, each task has its own private copy of these variables.

SAVELAST

Save the values of private variables from the last DO iteration.

MAXCPUS( n )

Use no more than n CPUs.

For Cray-style directives, the DOALL directive allows a single scheduling qualifier, for example, !MIC$& CHUNKSIZE(100). Table 10-8 shows the Cray-style DOALL directive

Table 10-8 DOALL Cray Scheduling

Qualifier 

Assertion 

GUIDED

Distribute the iterations by use of guided self-scheduling. 

This distribution minimizes synchronization overhead, with acceptable dynamic load balancing. 

SINGLE

Distribute one iteration to each available processor.

CHUNKSIZE( n )

Distribute n iterations to each available processor.

n may be an expression. For best performance, n must be an integer constant. Example: With 100 iterations and CHUNKSIZE(4), distribute 4 iterations to each CPU.

NUMCHUNKS( m )

If there are n iterations, distribute n/m iterations to each available processor. There can be one smaller residual chunk.

m is an expression. For best performance, m must be an integer constant. Example: With 100 iterations and NUMCHUNKS(4), distribute 25 iterations to each CPU.

scheduling qualifiers:

The f77 default scheduling type is the Sun-style STATIC. The f90 default is GUIDED.

Inhibitors to f90 Explicit Parallelization

With the explicit parallelization situations listed in "Inhibitors to Explicit Parallelization", the additional parallelization inhibitors for f90 include: