JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Solaris Studio 12.2: C User's Guide
search filter icon
search icon

Document Information

Preface

1.  Introduction to the C Compiler

2.  C-Compiler Implementation-Specific Information

3.  Parallelizing C Code

3.1 Overview

3.1.1 Example of Use

3.2 Parallelizing for OpenMP

3.2.1 Handling OpenMP Runtime Warnings

3.3 Environment Variables

3.3.1 PARALLEL or OMP_NUM_THREADS

3.3.2 SUNW_MP_THR_IDLE

3.3.3 SUNW_MP_WARN

3.3.4 STACKSIZE

3.3.5 Using restrict in Parallel Code

3.4 Data Dependence and Interference

3.4.1 Parallel Execution Model

3.4.2 Private Scalars and Private Arrays

3.4.3 Storeback

3.4.4 Reduction Variables

3.5 Speedups

3.5.1 Amdahl's Law

3.5.1.1 Overheads

3.5.1.2 Gustafson's Law

3.6 Load Balance and Loop Scheduling

3.6.1 Static or Chunk Scheduling

3.6.2 Self Scheduling

3.6.3 Guided Self Scheduling

3.7 Loop Transformations

3.7.1 Loop Distribution

3.7.2 Loop Fusion

3.7.3 Loop Interchange

3.8 Aliasing and Parallelization

3.8.1 Array and Pointer References

3.8.2 Restricted Pointers

3.8.3 Explicit Parallelization and Pragmas

3.8.3.1 Serial Pragmas

3.8.3.2 Parallel Pragma

Nesting of for Loops

Eligibility for Parallelizing

Number of Processors

Classifying Variables

Default Scoping Rules for private and shared Variables

private Variables

shared Variables

readonly Variables

storeback Variables

savelast

reduction Variables

Scheduling Control

3.9 Memory Barrier Intrinsics

4.  lint Source Code Checker

5.  Type-Based Alias Analysis

6.  Transitioning to ISO C

7.  Converting Applications for a 64-Bit Environment

8.  cscope: Interactively Examining a C Program

A.  Compiler Options Grouped by Functionality

B.  C Compiler Options Reference

C.  Implementation-Defined ISO/IEC C99 Behavior

D.  Supported Features of C99

E.  Implementation-Defined ISO/IEC C90 Behavior

F.  ISO C Data Representations

G.  Performance Tuning

H.  The Differences Between K&R Solaris Studio C and Solaris Studio ISO C

Index

3.6 Load Balance and Loop Scheduling

Loop scheduling is the process of distributing iterations of a parallel loop to multiple threads. In order to maximize the speedup, it is important that the work be distributed evenly among the threads while not imposing too much overhead. The compiler offers several types of scheduling for different situations.

3.6.1 Static or Chunk Scheduling

It is beneficial to divide the work evenly among the different threads on the system when the work performed by different iterations of a loop is the same. This approach is known as static scheduling.

Example 3-13 A Good Loop for Static Scheduling

for (i=1; i < 1000; i++) {
    sum += a[i]*b[i];       /* S1 */
}

Under static or chunk scheduling, each thread will get the same number of iterations. If there were 4 threads, then in the above example, each thread will get 250 iterations. Provided there are no interruptions and each thread progresses at the same rate, all the threads will complete at the same time.

3.6.2 Self Scheduling

Static scheduling will not achieve good load balance, in general, when the work performed by each iteration varies. In static scheduling, each thread grabs the same chunk of iterations. Each thread, except the master thread, upon completion of its chunk waits to participate in the next parallel loop execution. The master thread continues execution of the program. In self scheduling, each thread grabs a different small chunk of iteration and after completion of its assigned chunk, tries to acquire more chunks from the same loop.

3.6.3 Guided Self Scheduling

In guided self scheduling (GSS), each thread gets successively smaller number of chunks. In cases where the size of each iteration varies, GSS can help balance the load.