Sun WorkShop Compiler C 5.0 User's Guide

Chapter 3 Sun ANSI/ISO C Compiler-Specific Information

The Sun ANSI/ISO C compiler is compatible with the C language described in the American National Standard for Programming Language--C, ANSI/ISO 9899-1990. This chapter documents those areas specific to the Sun ANSI/ISO C compiler.

Environment Variables

TMPDIR

cc normally creates temporary files in the directory /tmp. You can specify another directory by setting the environment variable TMPDIR to the directory of your choice. However, if TMPDIR is not a valid directory, cc uses /tmp. The -xtemp option has precedence over the TMPDIR environment variable.

If you use a Bourne shell, type:

$ TMPDIR=dir; export TMPDIR

If you use a C shell, type:

% setenv TMPDIR dir

SUNPRO_SB_INIT_FILE_NAME

The absolute path name of the directory containing the .sbinit(5) file. This variable is used only if the -xsb or -xsbfast flag is used.

PARALLEL

(SPARC) Refer to "Environment Variables" for details.

Global Behavior: Value versus unsigned Preserving

A program that depends on unsigned preserving arithmetic conversions behaves differently. This is considered to be the most serious change made by ANSI/ISO C.

In the first edition of K&R, The C Programming Language (Prentice-Hall, 1978), unsigned specified exactly one type; there were no unsigned chars, unsigned shorts, or unsigned longs, but most C compilers added these very soon thereafter.

In previous C compilers, the unsigned preserving rule is used for promotions: when an unsigned type needs to be widened, it is widened to an unsigned type; when an unsigned type mixes with a signed type, the result is an unsigned type.

The other rule, specified by ANSI/ISO C, came to be called "value preserving," in which the result type depends on the relative sizes of the operand types. When an unsigned char or unsigned short is widened, the result type is int if an int is large enough to represent all the values of the smaller type. Otherwise, the result type is unsigned int. The value preserving rule produces the least surprise arithmetic result for most expressions.

Only in the -Xt and -Xs modes does the compiler use the unsigned preserving promotions; in the other modes, -Xc and -Xa, the value preserving promotion rules are used. When the -xtransition option is used, the compiler warns about each expression whose behavior might depend on the promotion rules used.

Keywords

asm Keyword

The _asm keyword is a synonym for the asm keyword. asm is available under all compilation modes, although a warning is issued when it is used under the -Xc mode.

The asm statement has the form:


asm("string"):

where string is a valid assembly language statement.

For example:


main()
{
	int i;

	/* i = 10 */
	asm("mov 10,%l0");
	asm("st  %l0,[%fp-8]");

	printf("i = %d\n",i);
}
% cc foo.c
% a.out
i = 10
%

asm statements must appear within function bodies.

_Restrict Keyword

For a compiler to effectively perform parallel execution of a loop, it needs to determine if certain lvalues designate distinct regions of storage. Aliases are lvalues whose regions of storage are not distinct. Determining if two pointers to objects are aliases is a difficult and time-consuming process because it could require analysis of the entire program.

Example: the function vsq()


void vsq(int n, double * a, double * b)
{
	int i;
	for (i=0; i<n; i++) b[i] = a[i] * a[i];
}

The compiler can parallelize the execution of the different iterations of the loops if it knows that pointers a and b access different objects. If there is an overlap in objects accessed through pointers a and b then it would be unsafe for the compiler to execute the loops in parallel. At compile time, the compiler does not know if the objects accessed by a and b overlap by simply analyzing the function vsq(); the compiler may need to analyze the whole program to get this information.

Restricted pointers are used to specify pointers which designate distinct objects so that the compiler can perform pointer alias analysis. To support restricted pointers, the keyword _Restrict is recognized by the Sun ANSI/ISO C compiler as an extension. Below is an example of declaring function parameters of vsq() as restricted pointers:


void vsq(int n, double * _Restrict a, double * _Restrict b)

Pointers a and b are declared as restricted pointers, so the compiler knows that the regions of storage pointed to by a and b are distinct. With this alias information, the compiler is able to parallelize the loop.

The _Restrict keyword is a type qualifier, like volatile, and it qualifies pointer types only. _Restrict is recognized as a keyword only for compilation modes -Xa (default) and -Xt. For these two modes, the compiler defines the macro __RESTRICT to enable users write portable code with restricted pointers.

The compiler defines the macro __RESTRICT to enable users to write portable code with restricted pointers. For example, the following code works on the Sun ANSI/ISO C compiler in all compilation modes, and should work on other compilers which do not support restricted pointers:


#ifdef __RESTRICT
#define restrict _Restrict
#else
#define restrict
#endif

void vsq(int n, double * restrict a, double * restrict b)
{
	int i;
	for (i=0; i<n; i++) b[i] = a[i] * a[i];
}

If restricted pointers become a part of the ANSI/ISO C Standard, it is likely that "restrict" will be the keyword. Users may want to write code with restricted pointers using:


#define restrict _Restrict

as in vsq() because this way there will be minimal changes should "restrict" become a keyword in the ANSI/ISO C Standard. The Sun ANSI/ISO C compiler uses _Restrict as the keyword because it is in the implementor's name space, so there is no conflict with identifiers in the user's name space.

There are situations where a user may not want to change the source code. One can specify pointer-valued function parameters to be treated as restricted pointers with the command-line option -xrestrict; refer to "-xrestrict=f" for details.

If a function list is specified, pointer parameters in the specified functions are treated as restricted; otherwise, all pointer parameters in the entire C file are treated as restricted. For example, -xrestrict=vsq would qualify the pointers a and b given in the example with the keyword _Restrict.

It is critical that _Restrict be used correctly. If pointers qualified as restricted pointers point to objects which are not distinct, loops may be incorrectly parallelized, resulting in undefined behavior. For example, assume that pointers a and b of function vsq() point to objects which overlap, such that b[i] and a[i+1] are the same object. If a and b are not declared as restricted pointers, the loops will be executed serially. If a and b are incorrectly qualified as restricted pointers, the compiler may parallelize the execution of the loops; this is not safe, because b[i+1] should only be computed after b[i] has been computed.

long long Data Type

The Sun ANSI/ISO C compiler includes the data types long long, and unsigned long long, which are similar to the data type long. long long can store 64 bits of information; long can store 32 bits of information. long long is not available in -Xc mode.

Printing long long Data Types

To print or scan long long data types, prefix the conversion specifier with the letters "ll." For example, to print llvar, a variable of long long data type, in signed decimal format, use:


printf("%lld\n", llvar);

Usual Arithmetic Conversions

Some binary operators convert the types of their operands to yield a common type, which is also the type of the result. These are called the usual arithmetic conversions:

Constants

This section contains information related to constants that is specific to the Sun ANSI/ISO C compiler.

Integral Constants

Decimal, octal, and hexadecimal integral constants can be suffixed to indicate type, as shown in the Table 3-1.

Table 3-1 Data Type Suffixes

Suffix 

Type 

u or U

unsigned

l or L

long

ll or LL

long long [long long and unsigned long long are not available in -Xc mode.]

lu, LU, Lu, lU, ul, uL, Ul, or UL

unsigned long

llu, LLU, LLu, llU, ull, ULL, uLL, Ull

unsigned long long1

When assigning types to unsuffixed constants, the compiler uses the first of this list in which the value can be represented, depending on the size of the constant:

Character Constants

A multiple-character constant that is not an escape sequence has a value derived from the numeric values of each character. For example, the constant '123' has a value of:

Table 3-2 Multiple-character Constant (ANSI/ISO)

0

'3'

'2'

'1'

or 0x333231.

With the -Xs option and in other, non-ANSI/ISO versions of C, the value is:

Table 3-3 Multiple-character Constant (non-ANSI/ISO)

0

'1'

'2'

'3'

or 0x313233.

Include Files

To include any of the standard header files supplied with the C compilation system, use this format:


#include <stdio.h> 

The angle brackets (<>) cause the preprocessor to search for the header file in the standard place for header files on your system, usually the /usr/include directory.

The format is different for header files that you have stored in your own directories:


#include "header.h"

The quotation marks (" ") cause the preprocessor to search for header.h first in the directory of the file containing the #include line.

If your header file is not in the same directory as the source files that include it, specify the path of the directory in which it is stored with the -I option to cc. Suppose, for instance, that you have included both stdio.h and header.h in the source file mycode.c:


#include <stdio.h>
#include "header.h"

Suppose further that header.h is stored in the directory../defs. The command:

% cc -I../defs mycode.c

directs the preprocessor to search for header.h first in the directory containing mycode.c, then in the directory ../defs, and finally in the standard place. It also directs the preprocessor to search for stdio.h first in ../defs, then in the standard place. The difference is that the current directory is searched only for header files whose names you have enclosed in quotation marks.

You can specify the -I option more than once on the cc command-line. The preprocessor searches the specified directories in the order they appear. You can specify multiple options to cc on the same command-line:

% cc -o prog -I../defs mycode.c

Nonstandard Floating Point

IEEE 754 floating-point default arithmetic is "nonstop." Underflows are "gradual." Following is a summary of explanation. See the Numerical Computation Guide for details.

Nonstop means that execution does not halt on occurrences like division by zero, floating-point overflow, or invalid operation exceptions. For example, consider the following, where x is zero and y is positive:

z = y / x;

By default, z is set to the value +Inf, and execution continues. With the -fnonstd option, however, this code causes an exit, such as a core dump.

Here is how gradual underflow works. Suppose you have the following code:


x = 10;
for (i = 0; i < LARGE_NUMBER; i++)
	x = x / 10;

The first time through the loop, x is set to 1; the second time through, to 0.1; the third time through, to 0.01; and so on. Eventually, x reaches the lower limit of the machine's capacity to represent its value. What happens the next time the loop runs?

Let's say that the smallest number characterizable is:

1.234567e-38

The next time the loop runs, the number is modified by "stealing" from the mantissa and "giving" to the exponent:

1.23456e-39

and, subsequently,

1.2345e-40

and so on. This is known as "gradual underflow," which is the default behavior. In nonstandard behavior, none of this "stealing" takes place; typically, x is simply set to zero.

Preprocessing Directives

This section describes assertions, pragmas, and predefined names.

Assertions

A line of the form:


#assert predicate (token-sequence) 

associates the token-sequence with the predicate in the assertion name space (separate from the space used for macro definitions). The predicate must be an identifier token.


#assert predicate 

asserts that predicate exists, but does not associate any token sequence with it.

The compiler provides the following predefined predicates by default (not in -Xc mode):


#assert
system (unix)
#assert machine (sparc)(SPARC)
#assert machine (i386)(Intel)
#assert cpu (sparc)(SPARC)
#assert cpu (i386)(Intel)

lint provides the following predefinition predicate by default (not in -Xc mode):


#assert lint (on)

Any assertion may be removed by using #unassert, which uses the same syntax as assert. Using #unassert with no argument deletes all assertions on the predicate; specifying an assertion deletes only that assertion.

An assertion may be tested in a #if statement with the following syntax:


#if #predicate(non-empty token-list) 

For example, the predefined predicate system can be tested with the following line:


#if #system(unix) 

which evaluates true.

Pragmas

Preprocessing lines of the form:


#pragma pp-tokens 

specify implementation-defined actions.

The following #pragmas are recognized by the compilation system. The compiler ignores unrecognized pragmas. Using the -v option will give a warning on unrecognized pragmas.

#pragma align integer (variable[, variable])

The align pragma makes all the mentioned variables memory aligned to integer bytes, overriding the default. The following limitations apply:

#pragma does_not_read_global_data (funcname [, funcname])

This pragma asserts that the specified list of routines do not read global data directly or indirectly. This allows for better optimization of code around calls to such routines. In particular, assignment statements or stores could be moved around such calls.

This pragma is permitted only after the prototype for the specified functions are declared. If the assertion about global access is not true, then the behavior of the program is undefined.

#pragma does_not_return (funcname [, funcname])

This pragma is an assertion to the compiler backend that the calls to the specified routines will not return. This allows the optimizer to perform optimizations consistent with that assumption. For example, register life-times will terminate at the call sites which in turn allows more optimizations.

If the specified function does return, then the behavior of the program is undefined.

This pragma is permitted only after the prototype for the specified functions are declared as the following example shows:


extern void exit(int);
#pragma does_note_return(exit);

extern void __assert(int);
#pragma does_not_return(__assert);

#pragma does_not_write_global_data (funcname [, funcname])

This pragma asserts that the specified list of routines do not write global data directly or indirectly. This allows for better optimization of code around calls to such routines. In particular, assignment statements or stores could be moved around such calls.

This pragma is permitted only after the prototype for the specified functions are declared. If the assertion about global access is not true, then the behavior of the program is undefined.

#pragma error_messages (on|off|default, tag... tag)

The error message pragma provides control within the source program over the messages issued by the C compiler and lint. For the C compiler, the pragma has an effect on warning messages only. The -w option of the C compiler overrides this pragma by suppressing all warning messages.

#pragma fini (f1[, f2...,fn])

Causes the implementation to call functions f1 to fn (finalization functions) after it calls main() routine. Such functions are expected to be of type void and to accept no arguments, and are called either when a program terminates under program control or when the containing shared object is removed from memory. As with "initialization functions," finalization functions are executed in the order processed by the link editors.

#pragma ident string

Places string in the .comment section of the executable.

#pragma init (f1[, f2...,fn])

Causes the implementation to call functions f1 to fn (initialization functions) before it calls main(). Such functions are expected to be of type void and to accept no arguments, and are called while constructing the memory image of the program at the start of execution. In the case of initializers in a shared object, they are executed during the operation that brings the shared object into memory, either program start-up or some dynamic loading operation, such as dlopen(). The only ordering of calls to initialization functions is the order in which they were processed by the link editors, both static and dynamic.

#pragma inline (funcname[, funcname])

This pragma controls the inlining of routine names listed in the argument of the pragma. The scope of this pragma is over the entire file. Only global inlining control is allowed, call-site specific control is not permitted by this pragma.

This pragma provides a suggestion to the compiler to inline the calls in the current file that match the list of routines listed in the pragma. This suggestion may be ignored under certain cases. For example, the suggestion is ignored when the body of the function is in a different module and the crossfile option is not used.

This pragma is permitted only after the prototype for the specified functions are declared as the following example shows:


static void foo(int);
static int bar(int, char *);
#pragma inline_routines(foo, bar);

#pragma int_to_unsigned (funcname)

For a function that returns a type of unsigned, in -Xt or -Xs mode, changes the function return to be of type int.

(SPARC) #pragma MP serial_loop

Refer to "Serial Pragmas" for details.

(SPARC) #pragma MP serial_loop_nested

Refer to "Serial Pragmas" for details.

(SPARC) #pragma MP taskloop

Refer to "Parallel Pragmas" for details.

#pragma no_inline (funcname[, funcname])

This pragma controls the inlining of the routine names listed in the argument of the pragma. The scope of this pragma is over the entire file. Only global inlining control is allowed, call-site specific control is not permitted by this pragma.

This pragma provides a suggestion to the compiler to not inline the calls in the current file that match the list of routines listed in the pragma.

This pragma is permitted only after the prototype for the specified functions.

(SPARC) #pragma nomemorydepend

This pragma specifies that for any iteration of a loop, there are no memory dependences. That is, within any iteration of a loop there are no references to the same memory. This pragma will permit the compiler (pipeliner) to schedule instructions, more effectively, within a single iteration of a loop. If any memory dependences exist within any iteration of a loop, the results of executing the program are undefined. The pragma applies to the next for loop within the current block. The compiler takes advantage of this information at optimization level of 3 or above.

(SPARC) #pragma no_side_effect (funcname)

funcname specifies the name of a function within the current translation unit. The function must be declared prior to the pragma. The pragma must be specified prior to the function's definition. For the named function, funcname, the pragma declares that the function has no side effects of any kind. The compiler can use this information when doing optimizations using the function. If the function does have side effects, the results of executing a program which calls this function are undefined. The compiler takes advantage of this information at optimization level of 3 or above.

#pragma optlevel (funcname[, funcname])

The value of opt specifies the optimization level for the funcname subprograms. You can assign opt levels zero, one, two three, four, and five. You can turn off optimization by setting level to 0. The funcname subprograms must be prototyped prior to the pragma.

The level of optimization for any function listed in the pragma is reduced to the value of -xmaxopt. The pragma is ignored when -xmaxopt=off.

#pragma pack(n)

Use #pragma pack(n), to affect member packing of a structure. By default, members of a structure are aligned on their natural boundaries; one byte for a char, two bytes for a short, four bytes for an integer etc. If n is present, it must be zero or a power of 2 specifying the strictest natural alignment for any structure member.

You can use #pragma pack(n) to specify a different aligned of a structure member. For example, #pragma pack(2) aligns int, long, long long, float, double, long double, and pointers on two byte boundaries instead of their natural alignment boundaries.

If n is the same or greater than the strictest alignment on your platform, (four on Intel, eight on SPARC v8, and 16 on SPARC v9), the directive has the effect of natural alignment. Also, if n is omitted, member alignment reverts to the natural alignment boundaries.

The #pragma pack(n) directive applies to all structure definitions which follow it until the next pack directive. If the same structure is defined in different translation units with different packing, your program may fail in unpredictable ways. In particular, you should not use #pragma pack(n) prior to including a header the defines the interface of a precompiled library. The recommended usage of #pragma pack(n) is to place it in your program code immediately before any structure to be packed. Follow the packed structure immediately with #pragma pack( ).

(SPARC) #pragma pipeloop(n)

This pragma accepts a positive constant integer value, or 0, for the argument n. This pragma specifies that a loop is pipelinable and the minimum dependence distance of the loop-carried dependence is n. If the distance is 0, then the loop is effectively a Fortran-style doall loop and should be pipelined on the target processors. If the distance is greater than 0, then the compiler (pipeliner) will only try to pipeline n successive iterations. The pragma applies to the next for loop within the current block. The compiler takes advantage of this information at optimization level of 3 or above.

#pragma rarely_called(funcname[, funcname])

This pragma provides a hint to the compiler backend that the specified functions are called infrequently. This allows the compiler to perform profile-feedback style optimizations on the call-sites of such routines without the overhead of a profile-collections phase. Since this pragma is a suggestion, the compiler optimizer may not perform any optimizations based on this pragma.

The #pragma rarely_called preprocessor directive is only permitted after the prototype for the specified functions are declares. The following is an example of #pragma rarely_called:


extern void error (char *message);
#pragma rarely_called(error);

#pragma redefine_extname old_extname new_extname

This pragma causes every externally defined occurrence of the name old_extname in the object code to be replaced by new_extname. As a result, the linker only sees the name new_extname at link time. If #pragma redefine_extname is encountered after the first use of old_extname, as a function definition, an initializer, or an expression, the effect is undefined. (This pragma is not supported in -Xs mode.)

When #pragma redefine_extname is available, the compiler provides a definition of the predefined macro PRAGMA_REDEFINE_EXTNAME which lets you write portable code that works both with and without #pragma redefine_extname.

The purpose of #pragma redefine_extname is to allow an efficient means of redefining a function interface when the name of the function cannot be changed. For example, when the original function definition must be maintained in a library, for compatibility with existing programs, along with a new definition of the same function for use by new programs. This can be accomplished by adding the new function definition to the library by a new name. Consequently, the header file that declares the function uses #pragma redefine_extname so that all of the uses of the function are linked with the new definition of that function.


#if    defined(__STDC__)

#ifdef __PRAGMA_REDEFINE_EXTNAME
extern int myroutine(const long *, int *);
#pragma redefine_extname myroutine __fixed_myroutine
#else /* __PRAGMA_REDEFINE_EXTNAME */

static int
myroutine(const long * arg1, int * arg2)
{
    extern int __myroutine(const long *, int*);
    return (__myroutine(arg1, arg2));
}
#endif /* __PRAGMA_REDEFINE_EXTNAME */

#else /* __STDC__ */

#ifdef __PRAGMA_REDEFINE_EXTNAME
extern int myroutine();
#pragma redefine_extnmae myroutine __fixed_myroutine
#else /* __PRAGMA_REDEFINE_EXTNAME */

static int
myroutine(arg1, arg2)
    long *arg1;
    int *arg2;
{
    extern int __fixed_myroutine();
    return (__fixed_myroutine(arg1, arg2));
}
#endif /* __PRAGMA_REDEFINE_EXTNAME */

#endif /* __STDC__ */

#pragma returns_new_memory (funcname[, funcname])

This pragma asserts that the return value of the specified functions does not alias with any memory at the call site. In effect, this call returns a new memory location. This informations allows the optimizer to better track pointer values and clarify memory location. This results in improved scheduling, pipelining, and parallelization of loops. However, if the assertion is false, the behavior of the program is undefined.

This pragma is permitted only after the prototype for the specified functions are declared as the following example shows:


void *malloc(unsigned);
#pragma returns_new_memory(malloc);

#pragma unknown_control_flow (name[, name])

Specifies a list of routines that violate the usual control flow properties of procedure calls. For example, the statement following a call to setjmp() can be reached from an arbitrary call to any other routine. The statement is reached by a call to longjmp(). Since such routines render standard flowgraph analysis invalid, routines that call them cannot be safely optimized; hence, they are compiled with the optimizer disabled.

(SPARC) #pragma unroll (unroll_factor)

This pragma accepts a positive constant integer value for the argument unroll_factor. The pragma applies to the next for loop within the current block. For unroll factor other than 1, this directive serves as a suggestion to the compiler that the specified loop should be unrolled by the given factor. The compiler will, when possible, use that unroll factor. When the unroll factor value is 1, this directive serves as a command which specifies to the compiler that the loop is not to be unrolled. The compiler takes advantage of this information at optimization level of 3 or above.

#pragma weak (symbol1 [= symbol2])

Defines a weak global symbol. This pragma is used mainly in source files for building libraries. The linker does not produce an error message if it is unable to resolve a weak symbol.


#pragma weak symbol

defines symbol to be a weak symbol. The linker does not produce an error message if it does not find a definition for symbol.


#pragma weak symbol1 = symbol2

defines symbol1 to be a weak symbol, which is an alias for the symbol symbol2. This form of the pragma can only be used in the same translation unit where symbol2 is defined, either in the sourcefiles or one of its included headerfiles. Otherwise, a compilation error will result.

If your program calls but does not define symbol1, and symbol1 is a weak symbol in a library being linked, the linker uses the definition from that library. However, if your program defines its own version of symbol1, then the program's definition is used and the weak global definition of symbol1 in the library is not used. If the program directly calls symbol2, the definition from the library is used; a duplicate definition of symbol2 causes an error.

Predefined Names

The following identifier is predefined as an object-like macro:

Table 3-4 Predefined Identifier

Identifier 

Description 

 __STDC__

__STDC__ 1 -Xc 

__STDC__ 0 -Xa, -Xt 

Not defined -Xs

The compiler will issue a warning if __STDC__ is undefined (#undef __STDC__). __STDC__ is not defined in -Xs mode.

Predefinitions (not valid in -Xc mode):

The following predefinitions are valid in all modes:

The compiler also predefines the object-like macro _ _PRAGMA_REDEFINE_EXTNAME

to indicate that the pragma will be recognized.

The following is predefined in -Xa and -Xt modes only:

_ _RESTRICT

MP C (SPARC)

SunSoft MP C is an extended ANSI/ISO C compiler that can optimize code to run on SPARC shared-memory multiprocessor machines. The process is called parallelizing. The compiled code can execute in parallel using the multiple processors on the system.

The SunSoft WorkShop includes the license required to use the features of MP C.

This section contains an overview and example of using MP C, and documents the environment variable, keyword, pragmas, and options used with MP C.

Refer to the "MP C" white paper, located in /opt/SUNWspro/READMEs/mpc.ps, for examples on using MP C and for further reference information.

Overview

The MP C compiler generates parallel code for those loops that it determines are safe to parallelize. Typically, these loops have iterations that are independent of each other. For such loops, it does not matter in what order the iterations are executed or if they are executed in parallel. Many, although not all, vector loops fall into this category.

Because of the way aliasing works in C, it is difficult to determine the safety of parallelization. To help the compiler, MP C offers pragmas and additional pointer qualifications to provide aliasing information known to the programmer that the compiler cannot determine.

Example of Use

The following example illustrates the use of MP C and how parallel execution can be controlled. To enable parallelization of the target program, the option can be used as follows:

% cc -fast -xO4 -xautopar example.c -o example

This generates an executable called example, which can be executed normally. For more information see "-xautopar".

Environment Variable

If multiprocessor execution is desired, the PARALLEL environment variable needs to be set. It specifies the number of processors available to the program:

% setenv PARALLEL 2

This will enable the execution of the program on two threads. If the target machine has multiple processors, the threads can map to independent processors.

% example

Running the program will lead to creation of two threads that will execute the parallelized portions of the program.

Keyword

The keyword _Restrict can be used with MP C. Refer to the section "_Restrict Keyword" for details.

Explicit Parallelization and Pragmas

Often, there is not enough information available for the compiler to make a decision on the legality or profitability of parallelization. MP C supports pragmas that allow the programmer to effectively parallelize loops that otherwise would be too difficult or impossible for the compiler to handle.

Serial Pragmas

There are two serial pragmas, and both apply to "for" loops:

The #pragma MP serial_loop pragma indicates to the compiler that the next for loop is not to be implicitly/automatically parallelized.

The #pragma MP serial_loop_nested pragma indicates to the compiler that the next for loop and any for loops nested within the scope of this for loop are not to be implicitly/automatically parallelized. The scope of the serial_loop_nested pragma does not extend beyond the scope of the loop to which it applies.

Parallel Pragmas

There is one parallel pragma: #pragma MP taskloop [options].

The MP taskloop pragma can, optionally, take one or more of the following arguments.

Only one option can be specified per MP taskloop pragma; however, the pragmas are cumulative and apply to the next for loop encountered within the current block in the source code:

#pragma MP taskloop maxcpus(4)

#pragma MP taskloop shared(a,b)

#pragma MP taskloop storeback(x)

These options may appear multiple times prior to the for loop to which they apply. In case of conflicting options, the compiler will issue a warning message.

Nesting of for loops

An MP taskloop pragma applies to the next for loop within the current block. There is no nesting of parallelized for loops by MP C.

Eligibility for Parallelizing

An MP taskloop pragma suggests to the compiler that, unless otherwise disallowed, the specified for loop should be parallelized.

For loops with irregular control flow and unknown loop iteration increment are not eligible for parallelization. For example, for loops containing setjmp, longjmp, exit, abort, return, goto, labels, and break should not be considered as candidates for parallelization.

Of particular importance is to note that for loops with inter-iteration dependencies can be eligible for explicit parallelization. This means that if a MP taskloop pragma is specified for such a loop the compiler will simply honor it, unless the for loop is disqualified. It is the user's responsibility to make sure that such explicit parallelization will not lead to incorrect results.

If both the serial_loop or serial_loop_nested and taskloop pragmas are specified for a for loop, the last one specified will prevail.

Consider the following example:

#pragma MP serial_loop_nested 

for (i=0; i<100; i++) { 

# pragma MP taskloop 

for (j=0; j<1000; j++) { 

... 

 } 

The i loop will not be parallelized but the j loop might be.

Number of Processors

#pragma MP taskloop maxcpus (number_of_processors) specifies the number of processors to be used for this loop, if possible.

The value of maxcpus must be a positive integer. If maxcpus equals 1, then the specified loop will be executed in serial. (Note that setting maxcpus to be 1 is equivalent to specifying the serial_loop pragma.) The smaller of the values of maxcpus or the interpreted value of the PARALLEL environment variable will be used. When the environment variable PARALLEL is not specified, it is interpreted as having the value 1.

If more than one maxcpus pragma is specified for a for loop, the last one specified will prevail.

Classifying Variables

A variable used in a loop is classified as being either a "private", "shared", "reduction", or "readonly" variable. The variable will belong to only one of these classifications. A variable can only be classified as a reduction or readonly variable via an explicit pragma. See #pragma MP taskloop reduction and #pragma MP taskloop readonly. A variable can be classified as being either a "private or "shared" variable via an explicit pragma or through the following default scoping rules.

Default Scoping Rules for Private and Shared Variables

A private variable is one whose value is private to each processor processing some iterations of a for loop. In other words, the value assigned to a private variable in one iteration of a for loop is not propagated to other processors processing other iterations of that for loop. A shared variable, on the other hand, is a variable whose current value is accessible by all processors processing iterations of a for loop. The value assigned to a shared variable by one processor working on iterations of a loop may be seen by other processors working on other iterations of the loop. Loops being explicitly parallelized through use of #pragma MP taskloop directives, that contain references to shared variables, must ensure that such sharing of values does not cause any correctness problems (such as race conditions). No synchronization is provided by the compiler on updates and accesses to shared variables in an explicitly parallelized loop.

In analyzing explicitly parallelized loops, the compiler uses the following "default scoping rules" to determine whether a variable is private or shared:

It is highly recommended that all variables used in an explicitly parallelized for loop be explicitly classified as one of shared, private, reduction, or readonly, to avoid the "default scoping rules."

Since the compiler does not perform any synchronization on accesses to shared variables, extreme care must be exercised before using an MP taskloop pragma for a loop that contains, for example, array references. If inter-iteration data dependencies exist in such an explicitly parallelized loop, then its parallel execution may give erroneous results. The compiler may or may not be able to detect such a potential problem situation and issue a warning message. In any case, the compiler will not disable the explicit parallelization of loops with potential shared variable problems.

Private Variables

#pragma MP taskloop private (list_of_private_variables) specifies all the variables that should be treated as private variables for this loop. All other variables used in the loop that are not explicitly specified as shared, readonly, or reduction variables, will be either shared or private as defined by the default scoping rules.

A private variable is one whose value is private to each processor processing some iterations of a loop. In other words, the value assigned to a private variable by one of the processors working on iterations of a loop is not propagated to other processors processing other iterations of that loop. A private variable has no initial value at the start of each iteration of a loop and must be set to a value within the iteration of a loop prior to its first use within that iteration. Execution of a program with a loop containing an explicitly declared private variable whose value is used prior to being set will result in undefined behavior.

Shared Variables

#pragma MP taskloop shared (list_of_shared_variables) specifies all the variables that should be treated as shared variables for this loop. All other variables used in the loop that are not explicitly specified as private, readonly, storeback or reduction variables, will be either shared or private as defined by the default scoping rules.

A shared variable is a variable whose current value is accessible by all processors processing iterations of a for loop. The value assigned to a shared variable by one processor working on iterations of a loop may be seen by other processors working on other iterations of the loop.

Read-only Variables

Read-only variables are a special class of shared variables that are not modified in any iteration of a loop. #pragma MP taskloop readonly (list_of_readonly_variables) indicates to the compiler that it may use a separate copy of that variable's value for each processor processing iterations of the loop.

Storeback Variables

#pragma MP taskloop storeback (list_of_storeback_variables) specifies all the variables to be treated as storeback variables.

A storeback variable is one whose value is computed in a loop, and this computed value is then used after the termination of the loop. The last loop iteration values of storeback variables are available for use after the termination of the loop. Such a variable is a good candidate to be declared explicitly via this directive as a storeback variable when the variable is a private variable, whether by explicitly declaring the variable private or by the default scoping rules.

Note that the storeback operation for a storeback variable occurs at the last iteration of the explicitly parallelized loop, regardless of whether or not that iteration updates the value of the storeback variable. In other words the processor that processes the last iteration of a loop may not be the same processor that currently contains the last updated value for a storeback variable. Consider the following example:


#pragma MP taskloop private(x)
#pragma MP taskloop storeback(x)
   for (i=1; i <= n; i++) {
      if (...) {
          x=...
      }
}
   printf ("%d", x);

In the previous example the value of the storeback variable x printed out via the printf() call may not be the same as that printed out by a serial version of the i loop, because in the explicitly parallelized case, the processor that processes the last iteration of the loop (when i==n), which performs the storeback operation for x may not be the same processor that currently contains the last updated value for x. The compiler will attempt to issue a warning message to alert the user of such potential problems.

In an explicitly parallelized loop, variables referenced as arrays are not treated as storeback variables. Hence it is important to include them in the list_of_storeback_variables if such storeback operation is desired (for example, if the variables referenced as arrays have been declared as private variables).

Savelast

#pragma MP taskloop savelast specifies that all the private variables of a loop be treated as a storeback variables. The syntax of this pragma is as follows:

#pragma MP taskloop savelast

It is often convenient to use this form, rather than list out each private variable of a loop when declaring each variable as storeback variables.

Reduction Variables

#pragma MP taskloop reduction (list_of_reduction_variables) specifies that all the variables appearing in the reduction list will be treated as reduction variables for the loop. A reduction variable is one whose partial values can be individually computed by each of the processors processing iterations of the loop, and whose final value can be computed from all its partial values. The presence of a list of reduction variables can facilitate the compiler in identifying that the loop is a reduction loop, allowing generation of parallel reduction code for it. Consider the following example:


#pragma MP taskloop reduction(x)
    for (i=0; i<n; i++) {
         x = x + a[i];
}

the variable x is a (sum) reduction variable and the i loop is a(sum) reduction loop.

Scheduling Control

The MP C compiler supports several pragmas that can be used in conjunction with the taskloop pragma to control the loop scheduling strategy for a given loop. The syntax for this pragma is:

#pragma MP taskloop schedtype (scheduling_type)

This pragma can be used to specify the specific scheduling_type to be used to schedule the parallelized loop. Scheduling_type can be one of the following:

In static scheduling all the iterations of the loop are uniformly distributed among all the participating processors. Consider the following example:


#pragma MP taskloop maxcpus(4)
#pragma MP taskloop schedtype(static)
    for (i=0; i<1000; i++) {
...
}

In the above example, each of the four processors will process 250 iterations of the loop.

In self scheduling, each participating processor processes a fixed number of iterations (called the "chunk size") until all the iterations of the loop have been processed. The optional chunk_size parameter specifies the "chunk size" to be used. Chunk_size must be a positive integer constant, or variable of integral type. If specified as a variable chunk_size must evaluate to a positive integer value at the beginning of the loop. If this optional parameter is not specified or its value is not positive, the compiler will select the chunk size to be used. Consider the following example:


#pragma MP taskloop maxcpus(4)
#pragma MP taskloop schedtype(self(120))
for (i=0; i<1000; i++) {
...
}

In the above example, the number of iterations of the loop assigned to each participating processor, in order of work request, are:

120, 120, 120, 120, 120, 120, 120, 120, 40.

In guided self scheduling, each participating processor processes a variable number of iterations (called the "min chunk size") until all the iterations of the loop have been processed. The optional min_chunk_size parameter specifies that each variable chunk size used must be at least min_chunk_size in size. Min_chunk_size must be a positive integer constant, or variable of integral type. If specified as a variable min_chunk_size must evaluate to a positive integer value at the beginning of the loop. If this optional parameter is not specified or its value is not positive, the compiler will select the chunk size to be used. Consider the following example:


#pragma MP taskloop maxcpus(4)
#pragma MP taskloop schedtype(gss(10))
for (i=0; i<1000; i++) {
...
}

In the above example, the number of iterations of the loop assigned to each participating processor, in order of work request, are:

250, 188, 141, 106, 79, 59, 45, 33, 25, 19, 14, 11, 10, 10, 10.

In factoring scheduling, each participating processor processes a variable number of iterations (called the "min chunk size") until all the iterations of the loop have been processed. The optional min_chunk_size parameter specifies that each variable chunk size used must be at least min_chunk_size in size. Min_chunk_size must be a positive integer constant, or variable of integral type. If specified as a variable min_chunk_size must evaluate to a positive integer value at the beginning of the loop. If this optional parameter is not specified or its value is not positive, the compiler will select the chunk size to be used. Consider the following example:


#pragma MP taskloop maxcpus(4)
#pragma MP taskloop schedtype(factoring(10))
for (i=0; i<1000; i++) {
...
}

In the above example, the number of iterations of the loop assigned to each participating processor, in order of work request, are:

125, 125, 125, 125, 62, 62, 62, 62, 32, 32, 32, 32, 16, 16, 16, 16, 10, 10, 10, 10, 10, 10.

Compiler Options

The following compiler options can be used in MP C.

"-xautopar"

"-xdepend"

"-xexplicitpar"

"-xloopinfo"

"-xparallel"

"-xreduction"

"-xrestrict=f"

"-xvpara"

"-Zlp"