Fortran Examples

Language:

To increase the performance of single processor applications, identify code constructs in an application that can be replaced by calls to Oracle Developer Studio Performance Library routines. Performance of multiprocessor applications can be increased by identifying opportunities for parallelization.

To increase application performance by modifying code to use Oracle Developer Studio Performance Library routines, identify blocks of code that exactly duplicate the capability of a Oracle Developer Studio Performance Library routine. The following code example is the matrix-vector product y ← Ax + y, which can be replaced with the DGEMV subroutine.

      DO I = 1, N
          DO J = 1, N
              Y(I) = Y(I) + A(I,J) * X(J)
          END DO
      END DO

In other cases, a block of code can be equivalent to several Oracle Developer Studio Performance Library calls or contain portions of code that can be replaced with calls to Oracle Developer Studio Performance Library routines. Consider the following code example.

      DO I = 1, N
          IF (V2(I,K) .LT. 0.0) THEN
              V2(I,K) = 0.0
          ELSE
              DO J = 1, M
                  X(J,I) = X(J,I) + Vl(J,K) * V2(I,K)
              END DO
          END IF 
      END DO

The code example can be rewritten to use the Oracle Developer Studio Performance Library routine DGER, as shown here.

      DO I = 1, N
          IF (V2(I,K) .LT. 0.0) THEN
             V2(I,K) = 0.0
          END IF 
      END DO
      CALL DGER (M, N, 1.0D0, X, LDX, Vl(l,K), 1, V2(1,K), 1)

The same code example can also be rewritten using Fortran 95 specific statements, as shown here.

WHERE (V(1:N,K) .LT. 0.0) THEN
       V(1:N,K) = 0.0
END WHERE 
CALL DGER (M, N, 1.0D0, X, LDX, Vl(l,K), 1, V2(1,K), 1)

Because the code to replace negative numbers with zero in V2 has no natural analog in Oracle Developer Studio Performance Library, that code is pulled out of the outer loop. With that code removed to its own loop, the rest of the loop is a rank-1 update of the general matrix x that can be replaced with the DGER routine from BLAS.

The amount of performance increase can also depend on the data the Oracle Developer Studio Performance Library routine uses. For example, if V2 contains many negative or zero values, the majority of the time might not be spent in the rank-1 update. In this case, replacing the code with a call to DGER might not increase performance.

Evaluating other loop indexes can affect the Oracle Developer Studio Performance Library routine used. For example, if the reference to K is a loop index, the loops in the code sample shown above might be part of a larger code structure, where the loops over DGEMV or DGER could be converted to some form of matrix multiplication. If so, a single call to a matrix multiplication routine can increase performance more than using a loop with calls to DGER.

Because all Oracle Developer Studio Performance Library routines are MT-safe (multithread safe), using the auto-parallelizing compiler to parallelize loops that contain calls to Oracle Developer Studio Performance Library routines can increase performance on multiprocessor platforms.

An example of combining a Oracle Developer Studio Performance Library routine with an auto-parallelizing compiler parallelization directive is shown in the following code example.

      C$PAR DOALL
      DO I = 1, N
             CALL DGBMV ('No transpose', N, N, ALPHA, A, LDA,
     $     B(l,I), 1, BETA, C(l,I), 1)
      END DO

Oracle Developer Studio Performance Library contains a routine named DGBMV to multiply a banded matrix by a vector. By putting this routine into a properly constructed loop, Oracle Developer Studio Performance Library routines can be used to multiply a banded matrix by a matrix. The compiler will not parallelize this loop by default because the presence of subroutine calls in a loop inhibits parallelization. However, Oracle Developer Studio Performance Library routines are MT-safe, so you can use parallelization directives that instruct the compiler to parallelize this loop.

Compiler directives can also be used to parallelize a loop with a subroutine call that ordinarily would not be parallelizable. For example, it is ordinarily not possible to parallelize a loop containing a call to some of the linear system solvers, because some vendors have implemented those routines using code that is not MT-safe. Loops containing calls to the expert drivers of the linear system solvers (routines whose names end in SVX or SVXX) are usually not parallelizable with other implementations of LAPACK. Because the implementation of LAPACK in Oracle Developer Studio Performance Library enables parallelization of loops containing such calls, users of multiprocessor platforms can get additional performance by parallelizing these loops.