Sun S3L 3.0 Programming and Reference Guide

Fast Fourier Transforms

S3L_fft

Description

S3L_fft performs a simple FFT on the complex parallel array a. The same FFT operation is performed along all axes of the array.

Both power-of-two and arbitrary radix FFTs are supported. The 1D parallel FFT can be used for sizes that are a multiple of the square of the number of processes. The 2D and 3D FFTs can be used for arbitrary sizes and distributions.

The S3L_fft routine computes a multidimensional transform by performing a one-dimensional transform along each axis in turn.

The sign of the twiddle factor exponents determines the direction of an FFT. Twiddle factors with a negative exponent imply a forward transform, and twiddle factors with positive exponents are used for an inverse transform.

For the 2D FFT, a more efficient transpose algorithm will be used if the blocksizes along each dimension are equal to the extents divided by the number of processes, resulting in significant performance improvements.

S3L_fft (and S3L_ifft) can only be used for complex and double complex data types. To compute a real-data forward FFT, use S3L_rc_fft. This performs a forward FFT on the real data, yielding packed representation of the complex results. To compute the corresponding inverse FFT, use S3L_cr_fft, which will perform an inverse FFT on the complex data, overwriting the original real array with real-valued results of the inverse FFT.

The floating-point precision of the result always matches that of the input.


Note -

S3L_fft and S3L_ifft do not perform any scaling. Consequently, when a forward FFT is followed by an inverse FFT, the original data will be scaled by the product of the extents of the array.


Syntax

The C and Fortran syntax for S3L_fft are shown below.

C/C++ Syntax


Example 8-31

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_fft(a, setup_id)
    S3L_array_t        a
    int                setup_id

F77/F90 Syntax


Example 8-32

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_fft(a, setup_id, ier)
    integer*8          a
    integer*4          setup_id
    integer*4          ier

Input

Output

This function uses the following arguments for output:

Error Handling

On success, S3L_fft returns S3L_SUCCESS.

S3L_fft performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.

The following conditions will cause the function to terminate and return the associated error code.

Examples

../examples/s3l/fft/fft.c
../examples/s3l/fft/ex_fft1.c
../examples/s3l/fft/ex_fft2.c
../examples/s3l/fft-f/fft.f

Related Functions

S3L_fft_setup(3)
S3L_fft_free_setup(3)
S3L_ifft(3)
S3L_fft_detailed(3)
S3L_cr_fft(3)
S3L_rc_fft(3)
S3L_rc_fft_setup(3)

S3L_fft_detailed

Description

S3L_fft_detailed computes the in-place forward or inverse FFT along a specified axis of a complex or double complex parallel array, a. FFT direction and axis are specified by the arguments iflag and axis, respectively. Both power-of-two and arbitrary radix FFTs are supported. Upon completion, a is overwritten with the FFT result.

A 1D parallel FFT can be used for array sizes that are a multiple of the square of the number of processes. Higher dimensionality FFTs can be used for arbitrary sizes and distributions.

For the 2D FFT, a more efficient transpose algorithm is employed when the blocksizes along each dimension are equal to the extents divided by the number of processes. This yields significant performance benefits.

S3L_fft_detailed can only be used for complex and double complex data types. To compute a real-data forward FFT, use S3L_rc_fft. This performs a forward FFT on the real data, yielding packed representation of the complex results. To compute the corresponding inverse FFT, use S3L_cr_fft, which will perform an inverse FFT on the complex data, overwriting the original real array with real-valued results of the inverse FFT.

The floating-point precision of the result always matches that of the input.


Note -

S3L_fft_detailed and S3L_ifft do not perform any scaling. Consequently, when a forward FFT is followed by an inverse FFT, the original data will be scaled by the product of the extents of the array.


Syntax

The C and Fortran syntax for S3L_fft_detailed are shown below.

C/C++ Syntax


Example 8-33

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_fft_detailed(a, setup_id, iflag, axis)
    S3L_array_t        a
    int                setup_id
    int                iflag
    int                axis

F77/F90 Syntax


Example 8-34

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_fft_detailed(a, setup_id, iflag, axis, ier)
    integer*8          a
    integer*4          setup_id
    integer*4          iflag
    integer*4          axis
    integer*4          ier

Input

Output

This function uses the following arguments for output:

Error Handling

On success, S3L_fft_detailed returns S3L_SUCCESS.

S3L_fft_detailed performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.

The following conditions will cause the function to terminate and return the associated error code.

Examples

../examples/s3l/fft/fft.c
../examples/s3l/fft/ex_fft1.c
../examples/s3l/fft/ex_fft2.c
../examples/s3l/fft-f/fft.f

Related Functions

S3L_fft_setup(3)
S3L_fft_free_setup(3)
S3L_ifft(3)
S3L_fft(3)
S3L_cr_fft(3)
S3L_rc_fft(3)
S3L_rc_fft_setup(3)

S3L_ifft

Description

Run S3L_ifft to compute the inverse FFT of the complex or double complex parallel array a. Use the setup ID returned by S3L_fft_setup to specify the array of interest.

Both power-of-two and arbitrary radix FFT are supported. The 1D parallel FFT can be used for sizes that are a multiple of the square of the number of nodes; the 2D and 3D FFTs can be used for arbitrary sizes and distributions.

Upon completion, a is overwritten with the result. The floating-point precision of the result always matches that of the input.

For the 2D FFT, if the blocksizes along each dimension are equal to the extents divided by the number of processes, a more efficient transpose algorithm is employed, which yields significant performance improvements.

S3L_ifft can only be used for complex and double complex data types. To compute a real-data forward FFT, use S3L_rc_fft. This performs a forward FFT on the real data, yielding packed representation of the complex results. To compute the corresponding inverse FFT, use S3L_cr_fft, which will perform an inverse FFT on the complex data, overwriting the original real array with real-valued results of the inverse FFT.


Note -

S3L_fft and S3L_ifft do not perform any scaling. Consequently, when a forward FFT is followed by an inverse FFT, the original data will be scaled by the product of the extents of the array.


Syntax

The C and Fortran syntax for S3L_ifft are shown below.

C/C++ Syntax


Example 8-35

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_ifft(a, setup_id)
    S3L_array_t        a
    int                setup_id

F77/F90 Syntax


Example 8-36

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_ifft(a, setup_id, ier)
    integer*8          a
    integer*4          setup_id
    integer*4          ier

Input

Output

This function uses the following arguments for output:

Error Handling

On success, S3L_ifft returns S3L_SUCCESS.

S3L_ifft performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.

The following conditions will cause the function to terminate and return the associated error code.

Examples

../examples/s3l/fft/fft.c
../examples/s3l/fft-f/fft.f

Related Functions

S3L_fft_setup(3)
S3L_fft_free_setup(3)
S3L_fft_detailed(3)

S3L_rc_fft and S3L_cr_fft

Description

S3L_rc_fft and S3l_cr_fft are used for computing the Fast Fourier Transform of real 1D, 2D, or 3D arrays. S3L_rc_fft performs a forward FFT of a real array and S3l_cr_fft performs the inverse FFT of a complex array with certain symmetry properties. The result of S3l_cr_fft is real.

S3L_rc_fft accepts as input a real (single- or double precision) parallel array and, upon successful completion, overwrites the contents of the real array with the complex Discrete Fourier Transform (DFT) of the data in a packed format.

S3L_cr_fft accepts as input a real array, which contains the packed representation of a complex array.

S3L_rc_fft and S3l_cr_fft have been optimized for cases where the arrays are distributed only along their last dimension. They also work, however, for any CYCLIC(n) array layout.

For the 2D FFT, a more efficient transposition algorithm is used when the blocksizes along each dimension are equal to the extents divided by the number of processors. This arrangement can result in significantly higher performance.

The algorithms used are non-standard extensions of the Cooley-Tuckey factorization and the Chinese Remainder Theorem. Both power-of-two and arbitrary radix FFTs are supported.

The nodal FFTs upon which the parallel FFT is based are mixed radix with prime factors 2, 3, 5, 7, 11, and 13. The parallel FFT will be more efficient when the size of the array is a product of powers of these factors. When the size of an array cannot be factored into these prime factors, a slower DFT is used for the remainder.

Supported Array Sizes

One Dimension: The array size must be divisible by 4 x p2, where p is the number of processors.

Two Dimensions: Each of the array lengths must be divisible by 2 x p, where p is the number of processors.

Three Dimensions: The first dimension must be even and must have a length of at least 4. The second and third dimensions must be divisible by 2 x p, where p is the number of processors.

Scaling

The real-to-complex and complex-to-real S3L parallel FFTs do not include scaling of the data. Consequently, for a forward 1D real-to-complex FFT of a vector of length n, followed by an inverse 1D complex-to-real FFT of the result, the original vector is multiplied by n/2.

If the data fits in a single process, a 1D real-to-complex FFT of a vector of length n, followed by a 1D complex-to-real FFT results in the original vector being scaled by n.

For a real-to-complex FFT of a 2D real array of size n x m, followed by a complex-to-real FFT, the original array is scaled by n x m.

Similarly, a real-to-complex FFT applied to a 3D real array of size n x m x k, followed by a complex-to-real FFT, results in the original array being scaled by n x m x k.

Complex Data Packed Representation

1D Real-to-Complex Periodic Fourier Transforms: The periodic Fourier Transform of a real sequence x[i], i=0,...,N-1 is Hermitian (exhibits conjugate symmetry around its middle point).

If X[i],i=0,...,N-1 are the complex values of the Fourier Transform, then


Example 8-37

 X[i] = conj(X[N-i]), i=1,...,N-1       (eq. 1)

Consider for example the real sequence:


Example 8-38

   X =

   0
   1
   2
   3
   4
   5
   6
   7

Its Fourier Transform is:


Example 8-39

   X =

   28.0000
   -4.0000 + 9.6569i
   -4.0000 + 4.0000i
   -4.0000 + 1.6569i
   -4.0000
   -4.0000 - 1.6569i
   -4.0000 - 4.0000i
   -4.0000 - 9.6569i

As you can see:


Example 8-40

   X[1] = conj(X[7])
   X[2] = conj(X[6])
   X[3] = conj(X[5])
   X[4] = conj(X[4]) (i.e.,
X[4] is real) 
   X[5] = conj(X[3])
   X[6] = conj(X[2])
   X[7] = conj(X[1])

Because of the Hermitian symmetry, only N/2+1 = 5 values of the complex sequence X need to be calculated and stored. The rest can be computed from (1).

Note that X[0] and X[N/2] are real valued so they can be grouped together as one complex number. In fact S3L stores the sequence X as:


Example 8-41

   X[0]    X[N/2]
   X[1]
   X[2]

   or

   X =
   28.0000 - 4.0000i
   -4.0000 + 9.6569i
   -4.0000 - 4.0000i
   -4.0000 + 1.6569i

The first line in this example represent the real and imaginary parts of a complex number.

To summarize, in S3L, the Fourier Transform of a real-valued sequence of length N (where N is even), is stored as a real sequence of length N. This is equivalent to a complex sequence of length N/2.

2D Fourier Transform: The method used for 2D FFTs is similar to that used for 1D FFTs. When transforming each of the array columns, only half of the data is stored.

3D Real to Hermitian FFT: As with the 1D and 2D FFTs, no extra storage is required for the 3D FFT of real data, since advantage is taken of all possible symmetries. For an array a(M,N,K), the result is packed in complex b(M/2,N,K) array. Hermitian symmetries exist along the planes a(0,:,:) and a(M/2,:,:) and along dimension 1.

See the rc_fft.c and rc_fft.f program examples for illustrations of these concepts. The paths for these online examples are provided at the end of this section.

Syntax

The C and Fortran syntax for S3L_rc_fft and S3L_cr_fft are shown below.

C/C++ Syntax


Example 8-42

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_rc_fft(a, setup_id)
S3L_cr_fft(a, setup_id)
    S3L_array_t        a
    int                setup_id

F77/F90 Syntax


Example 8-43

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_rc_fft(a, setup_id, ier)
S3L_cr_fft(a, setup_id, ier)
    integer*8          a
    integer*4          setup_id
    integer*4          ier

Input

Output

These functions use the following arguments for output:

Error Handling

On success, S3L_rc_fft and S3L_cr_fft return S3L_SUCCESS.

The following condition will cause these functions to terminate and return the associated error code.

Examples

../examples/s3l/rc_fft/rc_fft.c
../examples/s3l/rc_fft-f/rc_fft.f

Related Functions

S3L_rc_fft_setup(3)
S3L_rc_fft_free_setup(3)

S3L_fft_setup

Description

A call to S3L_fft_setup is the first step in executing Sun S3L Fast Fourier Transforms. You supply it with the parallel array (a) that is to be transformed. It returns a setup value in setup_id, which you use in subsequent calls to other S3L FFT routines.

When calling S3L_fft_setup, you may supply arbitrary values in a; the setup routine neither examines nor modifies the contents of this parallel array. It simply uses its size and type to create the setup object.

The setup ID computed by the S3L_fft_setup call can be used for any parallel arrays that have the same rank, extents, and type as the a argument supplied in the S3L_fft_setup call--but only for such parallel arrays. If a transform is to be performed on two parallel arrays, a and b, identical in rank, extents, and type, then one call to the setup routine suffices, even if transforms are performed on different axes of the two parallel arrays. But if a and b differ in rank, extents, or type, a separate setup call is required for each.

You may have more than one setup ID active at a time; that is, you may call the setup routine more than once before deallocating any setup IDs. For this reason, be careful that you specify the correct setup ID for calls to S3L_fft, S3L_ifft, S3L_fft_detailed, and S3L_fft_free_setup.

The time required to compute the contents of an FFT setup_id structure is substantially longer than the time required to actually perform an FFT. For this reason, and because it is common to perform FFTs on many parallel variables with the same rank, extents, and type, Sun S3L keeps the setup phase and transform phases distinct.

When a is no longer needed, call S3L_fft_free_setup to deallocate the FFT setup_id.

Syntax

The C and Fortran syntax for S3L_fft_setup are shown below.

C/C++ Syntax


Example 8-44

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_fft_setup(a, setup_id)
    S3L_array_t        a
    int                setup_id

F77/F90 Syntax


Example 8-45

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_fft_setup(a, setup_id, ier)
    integer*8          a
    integer*4          setup_id
    integer*4          ier

Input

Output

This function uses the following argument for output:

Error Handling

On success, S3L_fft_setup returns S3L_SUCCESS.

S3L_fft_setup performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code indicating which value of the array handle was invalid is returned. See Appendix A of this manual for a detailed list of these error codes.

The following conditions will cause S3L_fft_setup to terminate and return the associated error code:

Examples

../examples/s3l/fft/fft.c
../examples/s3l/fft/ex_fft1.c
../examples/s3l/fft/ex_fft2.c
../examples/s3l/fft-f/fft.f
../examples/s3l/fft-f/ex_fft1.f

Related Functions

S3L_fft(3)
S3L_fft_free_setup(3)
S3L_ifft(3)
S3L_fft_detailed(3)

S3L_rc_fft_setup

Description

S3L_rc_fft_setup allocates a real-to-complex FFT setup that includes the twiddle factors necessary for the computation and other internal structures. This setup depends only on the dimensions of the array whose FFT needs to be computed, and can be used both for the forward (real-to-complex) and inverse (complex-to-real) FFTs. Therefore, to compute multiple real-to-complex or complex-to-real Fourier transforms of different arrays whose extents are the same, the S3L_rc_fft_setup function has to be called only once.

Syntax

The C and Fortran syntax for S3L_rc_fft_setup are shown below.

C/C++ Syntax


Example 8-46

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_rc_fft_setup(a, setup_id)
    S3L_array_t        a
    int                setup_id

F77/F90 Syntax


Example 8-47

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_rc_fft_setup(a, setup_id, ier)
    integer*8          a
    integer*4          setup_id
    integer*4          ier

Input

Output

This function uses the following argument for output:

Error Handling

On success, S3L_rc_fft_setup returns S3L_SUCCESS.

The following conditions will cause S3L_rc_fft_setup to terminate and return the associated error code:

Examples

../examples/s3l/rc_fft/rc_fft.c
../examples/s3l/rc_fft-f/rc_fft.f

Related Functions

S3L_rc_fft(3)
S3L_cr_fft(3)
S3L_rc_fft_free_setup(3)

S3L_fft_free_setup

Description

S3L_fft_free_setup deallocates internal memory associated with setup_id by a previous call to S3L_fft_setup.

Syntax

The C and Fortran syntax for S3L_fft_free_setup are shown below.

C/C++ Syntax


Example 8-48

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_fft_free_setup(setup_id)
    int                setup_id

F77/F90 Syntax


Example 8-49

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_fft_free_setup(setup_id, ier)
    integer*4          setup_id
    integer*4          ier

Input

Output

This function uses the following argument for output:

Error Handling

On success, S3L_fft_free_setup returns S3L_SUCCESS.

The following condition will cause S3L_fft_free_setup to terminate and return the associated error code:

Examples

../examples/s3l/fft/fft.c
../examples/s3l/fft/ex_fft1.c
../examples/s3l/fft/ex_fft2.c
../examples/s3l/fft-f/fft.f
../examples/s3l/fft-f/ex_fft1.f

Related Functions

S3L_fft_setup(3)
S3L_fft(3)
S3L_ifft(3)
S3L_fft_detailed(3)

S3L_rc_fft_free_setup

Description

S3L_rc_fft_free_setup deallocates internal memory associated with setup_id by a previous call to S3L_rc_fft_setup.

Syntax

The C and Fortran syntax for S3L_rc_fft_free_setup are shown below.

C/C++ Syntax


Example 8-50

#include <s3l/s3l-c.h>
#include <s3l/s3l_errno-c.h>
int
S3L_rc_fft_free_setup(setup_id)
    int                setup_id

F77/F90 Syntax


Example 8-51

include `s3l/s3l-f.h'
include `s3l/s3l_errno-f.h'
subroutine
S3L_rc_fft_free_setup(setup_id, ier)
    integer*4          setup_id
    integer*4          ier

Input

Output

This function uses the following argument for output:

Error Handling

On success, S3L_rc_fft_free_setup returns S3L_SUCCESS.

The following condition will cause S3L_rc_fft_free_setup to terminate and return the associated error code:

Examples

../examples/s3l/rc_fft/rc_fft.c
../examples/s3l/rc_fft-f/rc_fft.f

Related Functions

S3L_rc_fft_setup(3)
S3L_rc_fft(3)