Sun S3L 3.0 Programming and Reference Guide

S3L_rc_fft and S3L_cr_fft

Description

S3L_rc_fft and S3l_cr_fft are used for computing the Fast Fourier Transform of real 1D, 2D, or 3D arrays. S3L_rc_fft performs a forward FFT of a real array and S3l_cr_fft performs the inverse FFT of a complex array with certain symmetry properties. The result of S3l_cr_fft is real.

S3L_rc_fft accepts as input a real (single- or double precision) parallel array and, upon successful completion, overwrites the contents of the real array with the complex Discrete Fourier Transform (DFT) of the data in a packed format.

S3L_cr_fft accepts as input a real array, which contains the packed representation of a complex array.

S3L_rc_fft and S3l_cr_fft have been optimized for cases where the arrays are distributed only along their last dimension. They also work, however, for any CYCLIC(n) array layout.

For the 2D FFT, a more efficient transposition algorithm is used when the blocksizes along each dimension are equal to the extents divided by the number of processors. This arrangement can result in significantly higher performance.

The algorithms used are non-standard extensions of the Cooley-Tuckey factorization and the Chinese Remainder Theorem. Both power-of-two and arbitrary radix FFTs are supported.

The nodal FFTs upon which the parallel FFT is based are mixed radix with prime factors 2, 3, 5, 7, 11, and 13. The parallel FFT will be more efficient when the size of the array is a product of powers of these factors. When the size of an array cannot be factored into these prime factors, a slower DFT is used for the remainder.

Supported Array Sizes

One Dimension: The array size must be divisible by 4 x p2, where p is the number of processors.

Two Dimensions: Each of the array lengths must be divisible by 2 x p, where p is the number of processors.

Three Dimensions: The first dimension must be even and must have a length of at least 4. The second and third dimensions must be divisible by 2 x p, where p is the number of processors.

Scaling

The real-to-complex and complex-to-real S3L parallel FFTs do not include scaling of the data. Consequently, for a forward 1D real-to-complex FFT of a vector of length n, followed by an inverse 1D complex-to-real FFT of the result, the original vector is multiplied by n/2.

If the data fits in a single process, a 1D real-to-complex FFT of a vector of length n, followed by a 1D complex-to-real FFT results in the original vector being scaled by n.

For a real-to-complex FFT of a 2D real array of size n x m, followed by a complex-to-real FFT, the original array is scaled by n x m.

Similarly, a real-to-complex FFT applied to a 3D real array of size n x m x k, followed by a complex-to-real FFT, results in the original array being scaled by n x m x k.

Complex Data Packed Representation

1D Real-to-Complex Periodic Fourier Transforms: The periodic Fourier Transform of a real sequence x[i], i=0,...,N-1 is Hermitian (exhibits conjugate symmetry around its middle point).

If X[i],i=0,...,N-1 are the complex values of the Fourier Transform, then


Example 8-37

 X[i] = conj(X[N-i]), i=1,...,N-1       (eq. 1)

Consider for example the real sequence:


Example 8-38

   X =

   0
   1
   2
   3
   4
   5
   6
   7

Its Fourier Transform is:


Example 8-39

   X =

   28.0000
   -4.0000 + 9.6569i
   -4.0000 + 4.0000i
   -4.0000 + 1.6569i
   -4.0000
   -4.0000 - 1.6569i
   -4.0000 - 4.0000i
   -4.0000 - 9.6569i

As you can see:


Example 8-40

   X[1] = conj(X[7])
   X[2] = conj(X[6])
   X[3] = conj(X[5])
   X[4] = conj(X[4]) (i.e.,
X[4] is real) 
   X[5] = conj(X[3])
   X[6] = conj(X[2])
   X[7] = conj(X[1])

Because of the Hermitian symmetry, only N/2+1 = 5 values of the complex sequence X need to be calculated and stored. The rest can be computed from (1).

Note that X[0] and X[N/2] are real valued so they can be grouped together as one complex number. In fact S3L stores the sequence X as:


Example 8-41

   X[0]    X[N/2]
   X[1]
   X[2]

   or

   X =
   28.0000 - 4.0000i
   -4.0000 + 9.6569i
   -4.0000 - 4.0000i
   -4.0000 + 1.6569i

The first line in this example represent the real and imaginary parts of a complex number.

To summarize, in S3L, the Fourier Transform of a real-valued sequence of length N (where N is even), is stored as a real sequence of length N. This is equivalent to a complex sequence of length N/2.

2D Fourier Transform: The method used for 2D FFTs is similar to that used for 1D FFTs. When transforming each of the array columns, only half of the data is stored.

3D Real to Hermitian FFT: As with the 1D and 2D FFTs, no extra storage is required for the 3D FFT of real data, since advantage is taken of all possible symmetries. For an array a(M,N,K), the result is packed in complex b(M/2,N,K) array. Hermitian symmetries exist along the planes a(0,:,:) and a(M/2,:,:) and along dimension 1.

See the rc_fft.c and rc_fft.f program examples for illustrations of these concepts. The paths for these online examples are provided at the end of this section.

Syntax

The C and Fortran syntax for S3L_rc_fft and S3L_cr_fft are shown below.

C/C++ Syntax


Example 8-42

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_rc_fft(a, setup_id)
S3L_cr_fft(a, setup_id)
    S3L_array_t        a
    int                setup_id

F77/F90 Syntax


Example 8-43

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_rc_fft(a, setup_id, ier)
S3L_cr_fft(a, setup_id, ier)
    integer*8          a
    integer*4          setup_id
    integer*4          ier

Input

Output

These functions use the following arguments for output:

Error Handling

On success, S3L_rc_fft and S3L_cr_fft return S3L_SUCCESS.

The following condition will cause these functions to terminate and return the associated error code.

Examples

../examples/s3l/rc_fft/rc_fft.c
../examples/s3l/rc_fft-f/rc_fft.f

Related Functions

S3L_rc_fft_setup(3)
S3L_rc_fft_free_setup(3)