Go to main content
Oracle® Developer Studio 12.5: Performance Library User's Guide

Exit Print View

Updated: June 2016
 
 

Degree of Parallelism

Selected routines in the Oracle Developer Studio Performance Library are parallelized using compiler directives, library routines, and environment variables from the OpenMP Fortran Application Program Interface. The number of threads these routines use in parallel is controlled by the environment variable OMP_NUM_THREADS, which you set at run time. You can also set the environment variable PARALLEL, but if you set both they must have the same value or a fatal error will occur upon execution. Both environment variables can be overridden by calling the Oracle Developer Studio Performance Library routine USE_THREADS or the OpenMP routine OMP_SET_NUM_THREADS in the user code.

A user code can be parallelized by doing the following:

  • Set environment variable OMP_NUM_THREADS to a value greater than 1

  • Use compiler parallel directives such as those from the OpenMP API

    Use appropriate compiler flags: -xopenmp=parallel or -xautopar

The Oracle Developer Studio Performance Library routines execute in parallel if the following conditions are met:

  • OMP_NUM_THREADS is set to a value greater than 1

  • The routines are not being called from a parallel region

The Oracle Developer Studio Performance Library employs OpenMP directives in its parallelization and does not support nested parallelism. If the user code is parallelized as stated above and calls a Oracle Developer Studio Performance Library routine, the routine executes in serial if it detects that it is being called from a parallel region. Otherwise, the routine executes in parallel.

POSIX or Oracle Solaris threads can also be created to execute in parallel selected regions in the user code. When a Performance Library routine is called under this parallel model, the routine cannot detect that it is being called from a parallel region. Therefore, the environment variable OMP_NUM_THREADS must be set to 1 or unset, or a call to USE_THREADS(3P) must be made in appropriate places in the user code. Otherwise, nested parallelism with undefined results will occur.

For example, if the program containing the following code segment is linked with -xopenmp=parallel and OMP_NUM_THREADS is set to 4, the loop will execute in parallel, and there will be four instances of DGEMM running concurrently. However, each DGEMM instance will run in serial since only one level of parallelization is supported.

!$OMP PARALLEL
    DO I = 1, N
        CALL DGEMM(...)
    END DO
!$OMP END PARALLEL

In the following code example, if the program is not linked with -xautopar, the loop will not be parallelized, but each instance of DGEMM will be executed by four threads.

    DO I = 1, N
        CALL DGEMM(...)
    END DO

If the program containing the following code segment is linked with -xopenmp=parallel and if OMP_NUM_THREADS is set to a value greater than 1, the region shown will be executed by a single thread. However, each DGEMM call will be executed by OMP_NUM_THREADS threads.

!$OMP SINGLE
    DO I = 1, N
        CALL DGEMM(...)
    END DO
!$OMP END SINGLE

In the following code example, there will be at most two-way parallelism, regardless of the number of OpenMP threads available for execution. Only one level of parallelism exists, which are the two sections. Further parallelism within a DGEMM call is suppressed.

!$OMP PARALLEL SECTIONS
!$OMP SECTION
    DO I = 1, N / 2
        CALL DGEMM(...)
    END DO
!$OMP SECTION
    DO I = N / 2 + 1, N
        CALL DGEMM(...)
    END DO
!$OMP END PARALLEL SECTIONS