Set the PARALLEL environment variable if you can take advantage of multiprocessor execution. The PARALLEL environment variable specifies the number of processors available to the program. The following example shows that PARALLEL is set to two:
% setenv PARALLEL 2
If the target machine has multiple processors, the threads can map to independent processors. Running the program leads to the creation of two threads that execute the parallelized portions of the program.
Currently, the starting thread of a program creates bound threads. Once created, these bound threads participate in executing the parallel part of a program (parallel loop, parallel region, etc.) and keep spin-waiting while the sequential part of the program runs. These bound threads never sleep or stop until the program terminates. Having these threads spin-wait generally gives the best performance when a parallelized program runs on a dedicated system. However, threads that are spin-waiting use system resources.
Use the SUNW_MP_THR_IDLE environment variable to control the status of each thread after it finishes its share of a parallel job.
% setenv SUNW_MP_THR_IDLE value
You can substitute either spin or sleep[n s|n ms] for value. The default is sleep, which puts the thread to sleep after spin-waiting n units. The wait unit can be seconds (s, the default unit) or milliseconds (ms), where 1s means one second, and 10ms means ten milliseconds. sleep with no arguments puts the thread to sleep immediately after completing a parallel task. sleep, sleep0, sleep0s, and sleep0ms are all equivalent. If a new job arrives before n units is reached, the thread stops spin-waiting and starts doing the new job.
The other choice, spin means the thread should spin (or busy-wait) after completing a parallel task, until a new parallel task arrives.
If SUNW_MP_THR_IDLE contains an illegal value or isn’t set, sleep is used as the default.
% setenv SUNW_MP_WARN TRUE
If you registered a function by using sunw_mp_register_warn() to handle warning messages, then SUNW_MP_WARN prints no warning messages, even if you set it to TRUE. If you did not register a function and set SUNW_MP_WARN to TRUE, SUNW_MP_WARN prints the warning messages to stderr. If you do not register a function and you do not set SUNW_MP_WARN, no warning messages are issued. For more information on sunw_mp_register_warn() see 3.2.1 Handling OpenMP Runtime Warnings.
The executing program maintains a main memory stack for the master thread and distinct stacks for each slave thread. Stacks are temporary memory address spaces used to hold arguments and automatic variables over subprogram invocations.
The default size of the main stack is about eight megabytes. Use the limit command to display the current main stack size as well as set it.
% limit cputime unlimited filesize unlimited datasize 2097148 kbytes stacksize 8192 kbytes <- current main stack size coredumpsize 0 kbytes descriptors 256 memorysize unlimited % limit stacksize 65536 <- set main stack to 64Mb
Each slave thread of a multithreaded program has its own thread stack. This stack mimics the main stack of the master thread but is unique to the thread. The thread’s private arrays and variables (local to the thread) are allocated on the thread stack.
All slave threads have the same stack size, which is four megabytes for 32-bit applications and eight megabytes for 64-bit applications by default. The size is set with the STACKSIZE environment variable:
% setenv STACKSIZE 16483 <- Set thread stack size to 16 Mb
Setting the thread stack size to a value larger than the default may be necessary for some parallelized code.
Sometimes the compiler may generate a warning message that indicates a bigger stack size is needed. However, it may not be possible to know just how large to set it, except by trial and error, especially if private/local arrays are involved. If the stack size is too small for a thread to run, the program will abort with a segmentation fault.
The keyword restrict can be used with parallelized C. The proper use of the keyword restrict helps the optimizer in understanding the aliasing of data required to determine if a code sequence can be parallelized. Refer to D.1.2 C99 Keywords for details.