The Sun ANSI/ISO C compiler is compatible with the C language described in the American National Standard for Programming Language--C, ANSI/ISO 9899-1990. This chapter documents those areas specific to the Sun ANSI/ISO C compiler.
cc normally creates temporary files in the directory /tmp. You can specify another directory by setting the environment variable TMPDIR to the directory of your choice. However, if TMPDIR is not a valid directory, cc uses /tmp. The -xtemp option has precedence over the TMPDIR environment variable.
If you use a Bourne shell, type:
$ TMPDIR=dir; export TMPDIR
If you use a C shell, type:
% setenv TMPDIR dir
The absolute path name of the directory containing the .sbinit(5) file. This variable is used only if the -xsb or -xsbfast flag is used.
(SPARC) Refer to "Environment Variables" for details.
A program that depends on unsigned preserving arithmetic conversions behaves differently. This is considered to be the most serious change made by ANSI/ISO C.
In the first edition of K&R, The C Programming Language (Prentice-Hall, 1978), unsigned specified exactly one type; there were no unsigned chars, unsigned shorts, or unsigned longs, but most C compilers added these very soon thereafter.
In previous C compilers, the unsigned preserving rule is used for promotions: when an unsigned type needs to be widened, it is widened to an unsigned type; when an unsigned type mixes with a signed type, the result is an unsigned type.
The other rule, specified by ANSI/ISO C, came to be called "value preserving," in which the result type depends on the relative sizes of the operand types. When an unsigned char or unsigned short is widened, the result type is int if an int is large enough to represent all the values of the smaller type. Otherwise, the result type is unsigned int. The value preserving rule produces the least surprise arithmetic result for most expressions.
Only in the -Xt and -Xs modes does the compiler use the unsigned preserving promotions; in the other modes, -Xc and -Xa, the value preserving promotion rules are used. When the -xtransition option is used, the compiler warns about each expression whose behavior might depend on the promotion rules used.
The _asm keyword is a synonym for the asm keyword. asm is available under all compilation modes, although a warning is issued when it is used under the -Xc mode.
The asm statement has the form:
asm("string"):
where string is a valid assembly language statement.
main() { int i; /* i = 10 */ asm("mov 10,%l0"); asm("st %l0,[%fp-8]"); printf("i = %d\n",i); } % cc foo.c % a.out i = 10 %
asm statements must appear within function bodies.
For a compiler to effectively perform parallel execution of a loop, it needs to determine if certain lvalues designate distinct regions of storage. Aliases are lvalues whose regions of storage are not distinct. Determining if two pointers to objects are aliases is a difficult and time-consuming process because it could require analysis of the entire program.
void vsq(int n, double * a, double * b) { int i; for (i=0; i<n; i++) b[i] = a[i] * a[i]; }
The compiler can parallelize the execution of the different iterations of the loops if it knows that pointers a and b access different objects. If there is an overlap in objects accessed through pointers a and b then it would be unsafe for the compiler to execute the loops in parallel. At compile time, the compiler does not know if the objects accessed by a and b overlap by simply analyzing the function vsq(); the compiler may need to analyze the whole program to get this information.
Restricted pointers are used to specify pointers which designate distinct objects so that the compiler can perform pointer alias analysis. To support restricted pointers, the keyword _Restrict is recognized by the Sun ANSI/ISO C compiler as an extension. Below is an example of declaring function parameters of vsq() as restricted pointers:
void vsq(int n, double * _Restrict a, double * _Restrict b)
Pointers a and b are declared as restricted pointers, so the compiler knows that the regions of storage pointed to by a and b are distinct. With this alias information, the compiler is able to parallelize the loop.
The _Restrict keyword is a type qualifier, like volatile, and it qualifies pointer types only. _Restrict is recognized as a keyword only for compilation modes -Xa (default) and -Xt. For these two modes, the compiler defines the macro __RESTRICT to enable users write portable code with restricted pointers.
The compiler defines the macro __RESTRICT to enable users to write portable code with restricted pointers. For example, the following code works on the Sun ANSI/ISO C compiler in all compilation modes, and should work on other compilers which do not support restricted pointers:
#ifdef __RESTRICT #define restrict _Restrict #else #define restrict #endif void vsq(int n, double * restrict a, double * restrict b) { int i; for (i=0; i<n; i++) b[i] = a[i] * a[i]; }
If restricted pointers become a part of the ANSI/ISO C Standard, it is likely that "restrict" will be the keyword. Users may want to write code with restricted pointers using:
#define restrict _Restrict
as in vsq() because this way there will be minimal changes should "restrict" become a keyword in the ANSI/ISO C Standard. The Sun ANSI/ISO C compiler uses _Restrict as the keyword because it is in the implementor's name space, so there is no conflict with identifiers in the user's name space.
There are situations where a user may not want to change the source code. One can specify pointer-valued function parameters to be treated as restricted pointers with the command-line option -xrestrict; refer to "-xrestrict=f" for details.
If a function list is specified, pointer parameters in the specified functions are treated as restricted; otherwise, all pointer parameters in the entire C file are treated as restricted. For example, -xrestrict=vsq would qualify the pointers a and b given in the example with the keyword _Restrict.
It is critical that _Restrict be used correctly. If pointers qualified as restricted pointers point to objects which are not distinct, loops may be incorrectly parallelized, resulting in undefined behavior. For example, assume that pointers a and b of function vsq() point to objects which overlap, such that b[i] and a[i+1] are the same object. If a and b are not declared as restricted pointers, the loops will be executed serially. If a and b are incorrectly qualified as restricted pointers, the compiler may parallelize the execution of the loops; this is not safe, because b[i+1] should only be computed after b[i] has been computed.
The Sun ANSI/ISO C compiler includes the data types long long, and unsigned long long, which are similar to the data type long. long long can store 64 bits of information; long can store 32 bits of information. long long is not available in -Xc mode.
To print or scan long long data types, prefix the conversion specifier with the letters "ll." For example, to print llvar, a variable of long long data type, in signed decimal format, use:
printf("%lld\n", llvar);
Some binary operators convert the types of their operands to yield a common type, which is also the type of the result. These are called the usual arithmetic conversions:
If either operand is type long double, the other operand is converted to long double.
Otherwise, if either operand has type double, the other operand is converted to double.
Otherwise, if either operand has type float, the other operand is converted to float.
Otherwise, the integral promotions are performed on both operands. Then, these rules are applied:
If either operand has type unsigned long long int, the other operator is converted to unsigned long long int.
If either operand has type long long int, the other operator is converted to long long int.
If either operand has type unsigned long int, the other operand is converted to unsigned long int.
Otherwise, if one operand has type long int and the other has type unsigned int, both operands are converted to unsigned long int.
Otherwise, if either operand has type long int, the other operand is converted to long int.
Otherwise, if either operand has type unsigned int, the other operand is converted to unsigned int.
Otherwise, both operands have type int.
This section contains information related to constants that is specific to the Sun ANSI/ISO C compiler.
Decimal, octal, and hexadecimal integral constants can be suffixed to indicate type, as shown in the Table 3-1.
Table 3-1 Data Type Suffixes
When assigning types to unsuffixed constants, the compiler uses the first of this list in which the value can be represented, depending on the size of the constant:
int
long int
unsigned long int
long long int
unsigned long long int
A multiple-character constant that is not an escape sequence has a value derived from the numeric values of each character. For example, the constant '123' has a value of:
Table 3-2 Multiple-character Constant (ANSI/ISO)
0 |
'3' |
'2' |
'1' |
or 0x333231.
With the -Xs option and in other, non-ANSI/ISO versions of C, the value is:
Table 3-3 Multiple-character Constant (non-ANSI/ISO)
0 |
'1' |
'2' |
'3' |
or 0x313233.
To include any of the standard header files supplied with the C compilation system, use this format:
#include <stdio.h>
The angle brackets (<>) cause the preprocessor to search for the header file in the standard place for header files on your system, usually the /usr/include directory.
The format is different for header files that you have stored in your own directories:
#include "header.h"
The quotation marks (" ") cause the preprocessor to search for header.h first in the directory of the file containing the #include line.
If your header file is not in the same directory as the source files that include it, specify the path of the directory in which it is stored with the -I option to cc. Suppose, for instance, that you have included both stdio.h and header.h in the source file mycode.c:
#include <stdio.h> #include "header.h"
Suppose further that header.h is stored in the directory../defs. The command:
% cc -I../defs mycode.c
directs the preprocessor to search for header.h first in the directory containing mycode.c, then in the directory ../defs, and finally in the standard place. It also directs the preprocessor to search for stdio.h first in ../defs, then in the standard place. The difference is that the current directory is searched only for header files whose names you have enclosed in quotation marks.
You can specify the -I option more than once on the cc command-line. The preprocessor searches the specified directories in the order they appear. You can specify multiple options to cc on the same command-line:
% cc -o prog -I../defs mycode.c
IEEE 754 floating-point default arithmetic is "nonstop." Underflows are "gradual." Following is a summary of explanation. See the Numerical Computation Guide for details.
Nonstop means that execution does not halt on occurrences like division by zero, floating-point overflow, or invalid operation exceptions. For example, consider the following, where x is zero and y is positive:
z = y / x;
By default, z is set to the value +Inf, and execution continues. With the -fnonstd option, however, this code causes an exit, such as a core dump.
Here is how gradual underflow works. Suppose you have the following code:
x = 10; for (i = 0; i < LARGE_NUMBER; i++) x = x / 10;
The first time through the loop, x is set to 1; the second time through, to 0.1; the third time through, to 0.01; and so on. Eventually, x reaches the lower limit of the machine's capacity to represent its value. What happens the next time the loop runs?
Let's say that the smallest number characterizable is:
1.234567e-38
The next time the loop runs, the number is modified by "stealing" from the mantissa and "giving" to the exponent:
1.23456e-39
and, subsequently,
1.2345e-40
and so on. This is known as "gradual underflow," which is the default behavior. In nonstandard behavior, none of this "stealing" takes place; typically, x is simply set to zero.
This section describes assertions, pragmas, and predefined names.
#assert predicate (token-sequence)
associates the token-sequence with the predicate in the assertion name space (separate from the space used for macro definitions). The predicate must be an identifier token.
#assert predicate
asserts that predicate exists, but does not associate any token sequence with it.
The compiler provides the following predefined predicates by default (not in -Xc mode):
#assert system (unix) #assert machine (sparc)(SPARC) #assert machine (i386)(Intel) #assert cpu (sparc)(SPARC) #assert cpu (i386)(Intel)
lint provides the following predefinition predicate by default (not in -Xc mode):
#assert lint (on)
Any assertion may be removed by using #unassert, which uses the same syntax as assert. Using #unassert with no argument deletes all assertions on the predicate; specifying an assertion deletes only that assertion.
An assertion may be tested in a #if statement with the following syntax:
#if #predicate(non-empty token-list)
For example, the predefined predicate system can be tested with the following line:
#if #system(unix)
which evaluates true.
Preprocessing lines of the form:
#pragma pp-tokens
specify implementation-defined actions.
The following #pragmas are recognized by the compilation system. The compiler ignores unrecognized pragmas. Using the -v option will give a warning on unrecognized pragmas.
The align pragma makes all the mentioned variables memory aligned to integer bytes, overriding the default. The following limitations apply:
The integer value must be a power of 2 between 1 and 128; valid values are: 1, 2, 4, 8, 16, 32, 64, and 128.
variable is a global or static variable; it cannot be an automatic variable.
If the specified alignment is smaller than the default, the default is used.
The pragma line must appear before the declaration of the variables which it mentions; otherwise, it is ignored.
Any variable that is mentioned but not declared in the text following the pragma line is ignored. For example:
#pragma align 64 (aninteger, astring, astruct) int aninteger; static char astring[256]; struct astruct{int a; char *b;};
This pragma asserts that the specified list of routines do not read global data directly or indirectly. This allows for better optimization of code around calls to such routines. In particular, assignment statements or stores could be moved around such calls.
This pragma is permitted only after the prototype for the specified functions are declared. If the assertion about global access is not true, then the behavior of the program is undefined.
This pragma is an assertion to the compiler backend that the calls to the specified routines will not return. This allows the optimizer to perform optimizations consistent with that assumption. For example, register life-times will terminate at the call sites which in turn allows more optimizations.
If the specified function does return, then the behavior of the program is undefined.
This pragma is permitted only after the prototype for the specified functions are declared as the following example shows:
extern void exit(int); #pragma does_note_return(exit); extern void __assert(int); #pragma does_not_return(__assert);
This pragma asserts that the specified list of routines do not write global data directly or indirectly. This allows for better optimization of code around calls to such routines. In particular, assignment statements or stores could be moved around such calls.
This pragma is permitted only after the prototype for the specified functions are declared. If the assertion about global access is not true, then the behavior of the program is undefined.
The error message pragma provides control within the source program over the messages issued by the C compiler and lint. For the C compiler, the pragma has an effect on warning messages only. The -w option of the C compiler overrides this pragma by suppressing all warning messages.
#pragma error_messages (on, tag... tag)
The on option ends the scope of any preceding #pragma error_messages option, such as the off option, and overrides the effect of the -erroff option.
#pragma error_messages (off, tag... tag)
The off option prevents the C compiler or the lint program from issuing the given messages beginning with the token specified in the pragma. The scope of the pragma for any specified error message remains in effect until overridden by another #pragma error_messages, or the end of compilation.
#pragma error_messages (default, tag... tag)
The default option ends the scope of any preceding #pragma error_messages directive for the specified tags.
Causes the implementation to call functions f1 to fn (finalization functions) after it calls main() routine. Such functions are expected to be of type void and to accept no arguments, and are called either when a program terminates under program control or when the containing shared object is removed from memory. As with "initialization functions," finalization functions are executed in the order processed by the link editors.
Places string in the .comment section of the executable.
Causes the implementation to call functions f1 to fn (initialization functions) before it calls main(). Such functions are expected to be of type void and to accept no arguments, and are called while constructing the memory image of the program at the start of execution. In the case of initializers in a shared object, they are executed during the operation that brings the shared object into memory, either program start-up or some dynamic loading operation, such as dlopen(). The only ordering of calls to initialization functions is the order in which they were processed by the link editors, both static and dynamic.
This pragma controls the inlining of routine names listed in the argument of the pragma. The scope of this pragma is over the entire file. Only global inlining control is allowed, call-site specific control is not permitted by this pragma.
This pragma provides a suggestion to the compiler to inline the calls in the current file that match the list of routines listed in the pragma. This suggestion may be ignored under certain cases. For example, the suggestion is ignored when the body of the function is in a different module and the crossfile option is not used.
This pragma is permitted only after the prototype for the specified functions are declared as the following example shows:
static void foo(int); static int bar(int, char *); #pragma inline_routines(foo, bar);
For a function that returns a type of unsigned, in -Xt or -Xs mode, changes the function return to be of type int.
Refer to "Serial Pragmas" for details.
Refer to "Serial Pragmas" for details.
Refer to "Parallel Pragmas" for details.
This pragma controls the inlining of the routine names listed in the argument of the pragma. The scope of this pragma is over the entire file. Only global inlining control is allowed, call-site specific control is not permitted by this pragma.
This pragma provides a suggestion to the compiler to not inline the calls in the current file that match the list of routines listed in the pragma.
This pragma is permitted only after the prototype for the specified functions.
This pragma specifies that for any iteration of a loop, there are no memory dependences. That is, within any iteration of a loop there are no references to the same memory. This pragma will permit the compiler (pipeliner) to schedule instructions, more effectively, within a single iteration of a loop. If any memory dependences exist within any iteration of a loop, the results of executing the program are undefined. The pragma applies to the next for loop within the current block. The compiler takes advantage of this information at optimization level of 3 or above.
funcname specifies the name of a function within the current translation unit. The function must be declared prior to the pragma. The pragma must be specified prior to the function's definition. For the named function, funcname, the pragma declares that the function has no side effects of any kind. The compiler can use this information when doing optimizations using the function. If the function does have side effects, the results of executing a program which calls this function are undefined. The compiler takes advantage of this information at optimization level of 3 or above.
The value of opt specifies the optimization level for the funcname subprograms. You can assign opt levels zero, one, two three, four, and five. You can turn off optimization by setting level to 0. The funcname subprograms must be prototyped prior to the pragma.
The level of optimization for any function listed in the pragma is reduced to the value of -xmaxopt. The pragma is ignored when -xmaxopt=off.
Use #pragma pack(n), to affect member packing of a structure. By default, members of a structure are aligned on their natural boundaries; one byte for a char, two bytes for a short, four bytes for an integer etc. If n is present, it must be zero or a power of 2 specifying the strictest natural alignment for any structure member.
You can use #pragma pack(n) to specify a different aligned of a structure member. For example, #pragma pack(2) aligns int, long, long long, float, double, long double, and pointers on two byte boundaries instead of their natural alignment boundaries.
If n is the same or greater than the strictest alignment on your platform, (four on Intel, eight on SPARC v8, and 16 on SPARC v9), the directive has the effect of natural alignment. Also, if n is omitted, member alignment reverts to the natural alignment boundaries.
The #pragma pack(n) directive applies to all structure definitions which follow it until the next pack directive. If the same structure is defined in different translation units with different packing, your program may fail in unpredictable ways. In particular, you should not use #pragma pack(n) prior to including a header the defines the interface of a precompiled library. The recommended usage of #pragma pack(n) is to place it in your program code immediately before any structure to be packed. Follow the packed structure immediately with #pragma pack( ).
This pragma accepts a positive constant integer value, or 0, for the argument n. This pragma specifies that a loop is pipelinable and the minimum dependence distance of the loop-carried dependence is n. If the distance is 0, then the loop is effectively a Fortran-style doall loop and should be pipelined on the target processors. If the distance is greater than 0, then the compiler (pipeliner) will only try to pipeline n successive iterations. The pragma applies to the next for loop within the current block. The compiler takes advantage of this information at optimization level of 3 or above.
This pragma provides a hint to the compiler backend that the specified functions are called infrequently. This allows the compiler to perform profile-feedback style optimizations on the call-sites of such routines without the overhead of a profile-collections phase. Since this pragma is a suggestion, the compiler optimizer may not perform any optimizations based on this pragma.
The #pragma rarely_called preprocessor directive is only permitted after the prototype for the specified functions are declares. The following is an example of #pragma rarely_called:
extern void error (char *message); #pragma rarely_called(error);
This pragma causes every externally defined occurrence of the name old_extname in the object code to be replaced by new_extname. As a result, the linker only sees the name new_extname at link time. If #pragma redefine_extname is encountered after the first use of old_extname, as a function definition, an initializer, or an expression, the effect is undefined. (This pragma is not supported in -Xs mode.)
When #pragma redefine_extname is available, the compiler provides a definition of the predefined macro PRAGMA_REDEFINE_EXTNAME which lets you write portable code that works both with and without #pragma redefine_extname.
The purpose of #pragma redefine_extname is to allow an efficient means of redefining a function interface when the name of the function cannot be changed. For example, when the original function definition must be maintained in a library, for compatibility with existing programs, along with a new definition of the same function for use by new programs. This can be accomplished by adding the new function definition to the library by a new name. Consequently, the header file that declares the function uses #pragma redefine_extname so that all of the uses of the function are linked with the new definition of that function.
#if defined(__STDC__) #ifdef __PRAGMA_REDEFINE_EXTNAME extern int myroutine(const long *, int *); #pragma redefine_extname myroutine __fixed_myroutine #else /* __PRAGMA_REDEFINE_EXTNAME */ static int myroutine(const long * arg1, int * arg2) { extern int __myroutine(const long *, int*); return (__myroutine(arg1, arg2)); } #endif /* __PRAGMA_REDEFINE_EXTNAME */ #else /* __STDC__ */ #ifdef __PRAGMA_REDEFINE_EXTNAME extern int myroutine(); #pragma redefine_extnmae myroutine __fixed_myroutine #else /* __PRAGMA_REDEFINE_EXTNAME */ static int myroutine(arg1, arg2) long *arg1; int *arg2; { extern int __fixed_myroutine(); return (__fixed_myroutine(arg1, arg2)); } #endif /* __PRAGMA_REDEFINE_EXTNAME */ #endif /* __STDC__ */
This pragma asserts that the return value of the specified functions does not alias with any memory at the call site. In effect, this call returns a new memory location. This informations allows the optimizer to better track pointer values and clarify memory location. This results in improved scheduling, pipelining, and parallelization of loops. However, if the assertion is false, the behavior of the program is undefined.
This pragma is permitted only after the prototype for the specified functions are declared as the following example shows:
void *malloc(unsigned); #pragma returns_new_memory(malloc);
Specifies a list of routines that violate the usual control flow properties of procedure calls. For example, the statement following a call to setjmp() can be reached from an arbitrary call to any other routine. The statement is reached by a call to longjmp(). Since such routines render standard flowgraph analysis invalid, routines that call them cannot be safely optimized; hence, they are compiled with the optimizer disabled.
This pragma accepts a positive constant integer value for the argument unroll_factor. The pragma applies to the next for loop within the current block. For unroll factor other than 1, this directive serves as a suggestion to the compiler that the specified loop should be unrolled by the given factor. The compiler will, when possible, use that unroll factor. When the unroll factor value is 1, this directive serves as a command which specifies to the compiler that the loop is not to be unrolled. The compiler takes advantage of this information at optimization level of 3 or above.
Defines a weak global symbol. This pragma is used mainly in source files for building libraries. The linker does not produce an error message if it is unable to resolve a weak symbol.
#pragma weak symbol
defines symbol to be a weak symbol. The linker does not produce an error message if it does not find a definition for symbol.
#pragma weak symbol1 = symbol2
defines symbol1 to be a weak symbol, which is an alias for the symbol symbol2. This form of the pragma can only be used in the same translation unit where symbol2 is defined, either in the sourcefiles or one of its included headerfiles. Otherwise, a compilation error will result.
If your program calls but does not define symbol1, and symbol1 is a weak symbol in a library being linked, the linker uses the definition from that library. However, if your program defines its own version of symbol1, then the program's definition is used and the weak global definition of symbol1 in the library is not used. If the program directly calls symbol2, the definition from the library is used; a duplicate definition of symbol2 causes an error.
The following identifier is predefined as an object-like macro:
Table 3-4 Predefined Identifier
Identifier |
Description |
__STDC__ |
__STDC__ 1 -Xc __STDC__ 0 -Xa, -Xt Not defined -Xs |
The compiler will issue a warning if __STDC__ is undefined (#undef __STDC__). __STDC__ is not defined in -Xs mode.
Predefinitions (not valid in -Xc mode):
sun
unix
sparc (SPARC)
i386 (Intel)
The following predefinitions are valid in all modes:
_ _sun
_ _unix
_ _SUNPRO_C=0x500
_ _`uname -s`_`uname -r` (example: _ _SunOS_5_7)
_ _sparc (SPARC)
_ _i386 (Intel)
_ _BUILTIN_VA_ARG_INCR
_ _SVR4
_ _sparcv9 (-Xarch=v9, v9a )
The compiler also predefines the object-like macro _ _PRAGMA_REDEFINE_EXTNAME
to indicate that the pragma will be recognized.
The following is predefined in -Xa and -Xt modes only:
_ _RESTRICT
SunSoft MP C is an extended ANSI/ISO C compiler that can optimize code to run on SPARC shared-memory multiprocessor machines. The process is called parallelizing. The compiled code can execute in parallel using the multiple processors on the system.
The SunSoft WorkShop includes the license required to use the features of MP C.
This section contains an overview and example of using MP C, and documents the environment variable, keyword, pragmas, and options used with MP C.
Refer to the "MP C" white paper, located in /opt/SUNWspro/READMEs/mpc.ps, for examples on using MP C and for further reference information.
The MP C compiler generates parallel code for those loops that it determines are safe to parallelize. Typically, these loops have iterations that are independent of each other. For such loops, it does not matter in what order the iterations are executed or if they are executed in parallel. Many, although not all, vector loops fall into this category.
Because of the way aliasing works in C, it is difficult to determine the safety of parallelization. To help the compiler, MP C offers pragmas and additional pointer qualifications to provide aliasing information known to the programmer that the compiler cannot determine.
The following example illustrates the use of MP C and how parallel execution can be controlled. To enable parallelization of the target program, the option can be used as follows:
% cc -fast -xO4 -xautopar example.c -o example
This generates an executable called example, which can be executed normally. For more information see "-xautopar".
If multiprocessor execution is desired, the PARALLEL environment variable needs to be set. It specifies the number of processors available to the program:
% setenv PARALLEL 2
This will enable the execution of the program on two threads. If the target machine has multiple processors, the threads can map to independent processors.
% example
Running the program will lead to creation of two threads that will execute the parallelized portions of the program.
The keyword _Restrict can be used with MP C. Refer to the section "_Restrict Keyword" for details.
Often, there is not enough information available for the compiler to make a decision on the legality or profitability of parallelization. MP C supports pragmas that allow the programmer to effectively parallelize loops that otherwise would be too difficult or impossible for the compiler to handle.
There are two serial pragmas, and both apply to "for" loops:
#pragma MP serial_loop
#pragma MP serial_loop_nested
The #pragma MP serial_loop pragma indicates to the compiler that the next for loop is not to be implicitly/automatically parallelized.
The #pragma MP serial_loop_nested pragma indicates to the compiler that the next for loop and any for loops nested within the scope of this for loop are not to be implicitly/automatically parallelized. The scope of the serial_loop_nested pragma does not extend beyond the scope of the loop to which it applies.
There is one parallel pragma: #pragma MP taskloop [options].
The MP taskloop pragma can, optionally, take one or more of the following arguments.
maxcpus (number_of_processors)
private (list_of_private_variables)
shared (list_of_shared_variables)
readonly (list_of_readonly_variables)
storeback (list_of_storeback_variables)
savelast
reduction (list_of_reduction_variables)
schedtype (scheduling_type)
Only one option can be specified per MP taskloop pragma; however, the pragmas are cumulative and apply to the next for loop encountered within the current block in the source code:
#pragma MP taskloop maxcpus(4)
#pragma MP taskloop shared(a,b)
#pragma MP taskloop storeback(x)
These options may appear multiple times prior to the for loop to which they apply. In case of conflicting options, the compiler will issue a warning message.
An MP taskloop pragma applies to the next for loop within the current block. There is no nesting of parallelized for loops by MP C.
An MP taskloop pragma suggests to the compiler that, unless otherwise disallowed, the specified for loop should be parallelized.
For loops with irregular control flow and unknown loop iteration increment are not eligible for parallelization. For example, for loops containing setjmp, longjmp, exit, abort, return, goto, labels, and break should not be considered as candidates for parallelization.
Of particular importance is to note that for loops with inter-iteration dependencies can be eligible for explicit parallelization. This means that if a MP taskloop pragma is specified for such a loop the compiler will simply honor it, unless the for loop is disqualified. It is the user's responsibility to make sure that such explicit parallelization will not lead to incorrect results.
If both the serial_loop or serial_loop_nested and taskloop pragmas are specified for a for loop, the last one specified will prevail.
Consider the following example:
#pragma MP serial_loop_nested for (i=0; i<100; i++) { # pragma MP taskloop for (j=0; j<1000; j++) { ... } } |
The i loop will not be parallelized but the j loop might be.
#pragma MP taskloop maxcpus (number_of_processors) specifies the number of processors to be used for this loop, if possible.
The value of maxcpus must be a positive integer. If maxcpus equals 1, then the specified loop will be executed in serial. (Note that setting maxcpus to be 1 is equivalent to specifying the serial_loop pragma.) The smaller of the values of maxcpus or the interpreted value of the PARALLEL environment variable will be used. When the environment variable PARALLEL is not specified, it is interpreted as having the value 1.
If more than one maxcpus pragma is specified for a for loop, the last one specified will prevail.
A variable used in a loop is classified as being either a "private", "shared", "reduction", or "readonly" variable. The variable will belong to only one of these classifications. A variable can only be classified as a reduction or readonly variable via an explicit pragma. See #pragma MP taskloop reduction and #pragma MP taskloop readonly. A variable can be classified as being either a "private or "shared" variable via an explicit pragma or through the following default scoping rules.
A private variable is one whose value is private to each processor processing some iterations of a for loop. In other words, the value assigned to a private variable in one iteration of a for loop is not propagated to other processors processing other iterations of that for loop. A shared variable, on the other hand, is a variable whose current value is accessible by all processors processing iterations of a for loop. The value assigned to a shared variable by one processor working on iterations of a loop may be seen by other processors working on other iterations of the loop. Loops being explicitly parallelized through use of #pragma MP taskloop directives, that contain references to shared variables, must ensure that such sharing of values does not cause any correctness problems (such as race conditions). No synchronization is provided by the compiler on updates and accesses to shared variables in an explicitly parallelized loop.
In analyzing explicitly parallelized loops, the compiler uses the following "default scoping rules" to determine whether a variable is private or shared:
If a variable is not explicitly classified via a pragma, the variable will default to being classified as a shared variable if it is declared as a pointer or array, and is only referenced using array syntax within the loop. Otherwise, it will be classified as a private variable.
The loop index variable is always treated as a private variable and is always a storeback variable.
It is highly recommended that all variables used in an explicitly parallelized for loop be explicitly classified as one of shared, private, reduction, or readonly, to avoid the "default scoping rules."
Since the compiler does not perform any synchronization on accesses to shared variables, extreme care must be exercised before using an MP taskloop pragma for a loop that contains, for example, array references. If inter-iteration data dependencies exist in such an explicitly parallelized loop, then its parallel execution may give erroneous results. The compiler may or may not be able to detect such a potential problem situation and issue a warning message. In any case, the compiler will not disable the explicit parallelization of loops with potential shared variable problems.
#pragma MP taskloop private (list_of_private_variables) specifies all the variables that should be treated as private variables for this loop. All other variables used in the loop that are not explicitly specified as shared, readonly, or reduction variables, will be either shared or private as defined by the default scoping rules.
A private variable is one whose value is private to each processor processing some iterations of a loop. In other words, the value assigned to a private variable by one of the processors working on iterations of a loop is not propagated to other processors processing other iterations of that loop. A private variable has no initial value at the start of each iteration of a loop and must be set to a value within the iteration of a loop prior to its first use within that iteration. Execution of a program with a loop containing an explicitly declared private variable whose value is used prior to being set will result in undefined behavior.
#pragma MP taskloop shared (list_of_shared_variables) specifies all the variables that should be treated as shared variables for this loop. All other variables used in the loop that are not explicitly specified as private, readonly, storeback or reduction variables, will be either shared or private as defined by the default scoping rules.
A shared variable is a variable whose current value is accessible by all processors processing iterations of a for loop. The value assigned to a shared variable by one processor working on iterations of a loop may be seen by other processors working on other iterations of the loop.
Read-only variables are a special class of shared variables that are not modified in any iteration of a loop. #pragma MP taskloop readonly (list_of_readonly_variables) indicates to the compiler that it may use a separate copy of that variable's value for each processor processing iterations of the loop.
#pragma MP taskloop storeback (list_of_storeback_variables) specifies all the variables to be treated as storeback variables.
A storeback variable is one whose value is computed in a loop, and this computed value is then used after the termination of the loop. The last loop iteration values of storeback variables are available for use after the termination of the loop. Such a variable is a good candidate to be declared explicitly via this directive as a storeback variable when the variable is a private variable, whether by explicitly declaring the variable private or by the default scoping rules.
Note that the storeback operation for a storeback variable occurs at the last iteration of the explicitly parallelized loop, regardless of whether or not that iteration updates the value of the storeback variable. In other words the processor that processes the last iteration of a loop may not be the same processor that currently contains the last updated value for a storeback variable. Consider the following example:
#pragma MP taskloop private(x) #pragma MP taskloop storeback(x) for (i=1; i <= n; i++) { if (...) { x=... } } printf ("%d", x);
In the previous example the value of the storeback variable x printed out via the printf() call may not be the same as that printed out by a serial version of the i loop, because in the explicitly parallelized case, the processor that processes the last iteration of the loop (when i==n), which performs the storeback operation for x may not be the same processor that currently contains the last updated value for x. The compiler will attempt to issue a warning message to alert the user of such potential problems.
In an explicitly parallelized loop, variables referenced as arrays are not treated as storeback variables. Hence it is important to include them in the list_of_storeback_variables if such storeback operation is desired (for example, if the variables referenced as arrays have been declared as private variables).
#pragma MP taskloop savelast specifies that all the private variables of a loop be treated as a storeback variables. The syntax of this pragma is as follows:
#pragma MP taskloop savelast
It is often convenient to use this form, rather than list out each private variable of a loop when declaring each variable as storeback variables.
#pragma MP taskloop reduction (list_of_reduction_variables) specifies that all the variables appearing in the reduction list will be treated as reduction variables for the loop. A reduction variable is one whose partial values can be individually computed by each of the processors processing iterations of the loop, and whose final value can be computed from all its partial values. The presence of a list of reduction variables can facilitate the compiler in identifying that the loop is a reduction loop, allowing generation of parallel reduction code for it. Consider the following example:
#pragma MP taskloop reduction(x) for (i=0; i<n; i++) { x = x + a[i]; }
the variable x is a (sum) reduction variable and the i loop is a(sum) reduction loop.
The MP C compiler supports several pragmas that can be used in conjunction with the taskloop pragma to control the loop scheduling strategy for a given loop. The syntax for this pragma is:
#pragma MP taskloop schedtype (scheduling_type)
This pragma can be used to specify the specific scheduling_type to be used to schedule the parallelized loop. Scheduling_type can be one of the following:
static
In static scheduling all the iterations of the loop are uniformly distributed among all the participating processors. Consider the following example:
#pragma MP taskloop maxcpus(4) #pragma MP taskloop schedtype(static) for (i=0; i<1000; i++) { ... }
In the above example, each of the four processors will process 250 iterations of the loop.
self [(chunk_size)]
In self scheduling, each participating processor processes a fixed number of iterations (called the "chunk size") until all the iterations of the loop have been processed. The optional chunk_size parameter specifies the "chunk size" to be used. Chunk_size must be a positive integer constant, or variable of integral type. If specified as a variable chunk_size must evaluate to a positive integer value at the beginning of the loop. If this optional parameter is not specified or its value is not positive, the compiler will select the chunk size to be used. Consider the following example:
#pragma MP taskloop maxcpus(4) #pragma MP taskloop schedtype(self(120)) for (i=0; i<1000; i++) { ... }
In the above example, the number of iterations of the loop assigned to each participating processor, in order of work request, are:
120, 120, 120, 120, 120, 120, 120, 120, 40.
gss [(min_chunk_size)]
In guided self scheduling, each participating processor processes a variable number of iterations (called the "min chunk size") until all the iterations of the loop have been processed. The optional min_chunk_size parameter specifies that each variable chunk size used must be at least min_chunk_size in size. Min_chunk_size must be a positive integer constant, or variable of integral type. If specified as a variable min_chunk_size must evaluate to a positive integer value at the beginning of the loop. If this optional parameter is not specified or its value is not positive, the compiler will select the chunk size to be used. Consider the following example:
#pragma MP taskloop maxcpus(4) #pragma MP taskloop schedtype(gss(10)) for (i=0; i<1000; i++) { ... }
In the above example, the number of iterations of the loop assigned to each participating processor, in order of work request, are:
250, 188, 141, 106, 79, 59, 45, 33, 25, 19, 14, 11, 10, 10, 10.
factoring [(min_chunk_size)]
In factoring scheduling, each participating processor processes a variable number of iterations (called the "min chunk size") until all the iterations of the loop have been processed. The optional min_chunk_size parameter specifies that each variable chunk size used must be at least min_chunk_size in size. Min_chunk_size must be a positive integer constant, or variable of integral type. If specified as a variable min_chunk_size must evaluate to a positive integer value at the beginning of the loop. If this optional parameter is not specified or its value is not positive, the compiler will select the chunk size to be used. Consider the following example:
#pragma MP taskloop maxcpus(4) #pragma MP taskloop schedtype(factoring(10)) for (i=0; i<1000; i++) { ... }
In the above example, the number of iterations of the loop assigned to each participating processor, in order of work request, are:
125, 125, 125, 125, 62, 62, 62, 62, 32, 32, 32, 32, 16, 16, 16, 16, 10, 10, 10, 10, 10, 10.
The following compiler options can be used in MP C.