C H A P T E R 5 - Using Sun Performance Library Signal Processing Routines

C H A P T E R 5

Using Sun Performance Library Signal Processing Routines

The discrete Fourier transform (DFT) has always been an important analytical tool in many areas in science and engineering. However, it was not until the development of the fast Fourier transform (FFT) that the DFT became widely used. This is because the DFT requires O(N²) computations, while the FFT only requires O(Nlog₂N) operations.

Sun Performance Library contains a set of routines that computes the FFT, related FFT operations, such as convolution and correlation, and trigonometric transforms.

This chapter is divided into the following three sections.

Forward and Inverse FFT Routines

Sine and Cosine Transforms

Convolution and Correlation

Each section includes examples that show how the routines might be used.

For information on the Fortran 95 and C interfaces and types of arguments used in each routine, see the section 3P man pages for the individual routines. For example, to display the man page for the SFFTC routine, type man -s 3P sfftc. Routine names must be lowercase. For an overview of the FFT routines, type man -s 3P fft.

Forward and Inverse FFT Routines

TABLE 5-1 lists the names of the FFT routines and their calling sequence. Double precision routine names are in square brackets. See the individual man pages for detailed information on the data type and size of the arguments.


Routine Name	Arguments
Linear Routines
`CFFTS` `[ZFFTD]`	`(OPT,` `N1,` `SCALE,` `X,` `Y,` `TRIGS,` `IFAC,` `WORK,` `LWORK,` `ERR)`
`SFFTC` `[DFFTZ]`	`(OPT,` `N1,` `SCALE,` `X,` `Y,` `TRIGS,` `IFAC,` `WORK,` `LWORK,` `ERR)`
`CFFTSM` `[ZFFTDM]`	`(OPT,` `N1,` `N2,` `SCALE,` `X,` `LDX1,` `Y,` `LDY1,` `TRIGS,` `IFAC,` `WORK,` `LWORK,` `ERR)`
`SFFTCM` `[DFFTZM]`	`(OPT,` `N1,` `N2,` `SCALE,` `X,` `LDX1,` `Y,` `LDY1,` `TRIGS,` `IFAC,` `WORK,` `LWORK,` `ERR)`
`CFFTC` `[ZFFTZ]`	`(OPT,` `N1,` `SCALE,` `X,` `Y,` `TRIGS,` `IFAC,` `WORK,` `LWORK,` `ERR)`
`CFFTCM` `[ZFFTZM]`	`(OPT,` `N1,` `N2,` `SCALE,` `X,` `LDX1,` `Y,` `LDY1,` `TRIGS,` `IFAC,` `WORK,` `LWORK,` `ERR)`
Two-Dimensional Routines
`CFFTS2` `[ZFFTD2]`	`(OPT,` `N1,` `N2,` `SCALE,` `X,` `LDX1,` `Y,` `LDY1,` `TRIGS,` `IFAC,` `WORK,` `LWORK,` `ERR)`
`SFFTC2` `[DFFTZ2]`	`(OPT,` `N1,` `N2,` `SCALE,` `X,` `LDX1,` `Y,` `LDY1,` `TRIGS,` `IFAC,` `WORK,` `LWORK,` `ERR)`
`CFFTC2` `[ZFFTZ2]`	`(OPT,` `N1,` `N2,` `SCALE,` `X,` `LDX1,` `Y,` `LDY1,` `TRIGS,` `IFAC,` `WORK,` `LWORK,` `ERR)`
Three-Dimensional Routines
`CFFTS3` `[ZFFTD3]`	`(OPT,` `N1,` `N2,` `N3,` `SCALE,` `X,` `LDX1,` `LDX2,` `Y,` `LDY1,` `LDY2,` `TRIGS,` `IFAC,` `WORK,` `LWORK,` `ERR)`
`SFFTC3` `[DFFTZ3]`	`(OPT,` `N1,` `N2,` `N3,` `SCALE,` `X,` `LDX1,` `LDX2,` `Y,` `LDY1,` `LDY2,` `TRIGS,` `IFAC,` `WORK,` `LWORK,` `ERR)`
`CFFTC3` `[ZFFTZ3]`	`(OPT,` `N1,` `N2,` `N3,` `SCALE,` `X,` `LDX1,` `LDX2,` `Y,` `LDY1,` `LDY2,` `TRIGS,` `IFAC,` `WORK,` `LWORK,` `ERR)`

Sun Performance Library FFT routines use the following arguments.

OPT: Flag indicating whether the routine is called to initialize or to compute the transform.

N1, N2, N3: Problem dimensions for one, two, and three dimensional transforms.

X: Input array where X is of type COMPLEX if the routine is a complex-to-complex transform or a complex-to-real tranform. X is of type REAL for a real-to-complex transform.

Y: Output array where Y is of type COMPLEX if the routine is a complex-to-complex transform or a real-to-complex tranform. Y is of type REAL for a complex-to-real transform.

LDX1, LDX2 and LDY1, LDY2: LDX1 and LDX2 are the leading dimensions of the input array, and LDY1 and LDY2 are the leading dimensions of the output array. The FFT routines allow the output to overwrite the input, which is an in-place transform, or to be stored in a separate array apart from the input array, which is an out-of-place transform. In complex-to-complex tranforms, the input data is of the same size as the output data. However, real-to-complex and complex-to-real transforms have different memory requirements for input and output data. Care must be taken to ensure that the input array is large enough to acommodate the transform results when computing an in-place tranform.

TRIGS: Array containing the trigonometric weights.

IFAC: Array containing factors of the problem dimensions. The problem sizes are as follows:

Linear FFT: Problem size of dimension N1

Two-dimensional FFT: Problem size of dimensions N1 and N2

Three-dimensional FFT: Problem size of dimensions N1, N2, and N3

While N1, N2, and N3 can be of any size, a real-to-complex or a complex-to-real transform can be computed most efficiently when

$N1, N2, N3=2^{p}\times 3^{q}\times 4^{r}\times 5^{s}$

and a complex-to-complex transform can be computed most efficiently when

$N1, N2, N3=2^{p}\times 3^{q}\times 4^{r}\times 5^{s}\times 7^{t}\times 11^{u}\times 13^{v}$

where p, q, r, s, t, u, and v are integers and p, q, r, s, t, u, v greater than or equal 0.

WORK: Workspace whose size depends on the routine and the number of threads that are being used to compute the transform if the routine is parallelized.

LWORK: Size of workspace. If LWORK is zero, the routine will allocate a workspace with the required size.

SCALE: A scalar with which the output is scaled. Occasionally in literature, the inverse transform is defined with a scaling factor of

mathematical equation mathematical equation mathematical equation for one-dimensional transforms, for two-dimensional transforms, and for three-dimensional transforms. In such case, the inverse transform is said to be normalized. If a normalized FFT is followed by its inverse FFT, the result is the original input data. The Sun Performance Library FFT routines are not normalized. However, normalization can be done easily by calling the inverse FFT routine with the appropriate scaling factor stored in SCALE.

ERR: A flag returning a nonzero value if an error is encountered in the routine and zero otherwise.

Linear FFT Routines

Linear FFT routines compute the FFT of real or complex data in one dimension only. The data can be one or more complex or real sequences. For a single sequence, the data is stored in a vector. If more than one sequence is being transformed, the sequences are stored column-wise in a two-dimensional array and a one-dimensional FFT is computed for each sequence along the column direction. The linear forward FFT routines compute

$X(k)=\sum _{n=0}^{N1-1}x(n)e^{\frac{-2\pi ink}{N1}},\, \, k=0,\ldots N1-1$ ,

where $( i=\sqrt{-1})$ , or expressed in polar form,

$X(k)=\sum _{n=0}^{N1-1}x(n)(\cos (\frac{2\pi nk}{N1})-i\sin (\frac{2\pi nk}{N1})),\, \, k=0,\ldots N1-1$ .

The inverse FFT routines compute

$x(n)=\sum _{k=0}^{N1-1}X(k)e^{\frac{2\pi ink}{N1}},\, \, n=0,\ldots N1-1$ ,

or in polar form,

$x(n)=\sum _{k=0}^{N1-1}X(k)(\cos (\frac{2\pi nk}{N1})+i\sin (\frac{2\pi nk}{N1}))\, \, n=0,\ldots N1-1$ .

With the forward transform, if the input is one or more complex sequences of size N1, the result will be one or more complex sequences, each consisting of N1 unrelated data points. However, if the input is one or more real sequences, each containing N1 real data points, the result will be one or more complex sequences that are conjugate symmetric. That is,

$X(k)=X^{\star }(N1-k),\, \, k=\frac{N1}{2}+1,\ldots N1-1$ .

The imaginary part of X(0) is always zero. If N1 is even, the imaginary part of mathematical equationmathematical equation is also zero. Both zeros are stored explicitly. Because the second half of each sequence can be derived from the first half, only complex data points are computed and stored in the output array. Here and elsewhere in this chapter, integer division is rounded down.

With the inverse transform, if an N1-point complex-to-complex transform is being computed, then N1 unrelated data points are expected in each input sequence and N1 data points will be returned in the output array. However, if an N1-point complex-to-real transform is being computed, only the first mathematical equation complex data points of each conjugate symmetric input sequence are expected in the input, and the routine will return N1 real data points in each output sequence.

For each value of N1, either the forward or the inverse routine must be called to compute the factors of N1 and the trigonometric weights associated with those factors before computing the actual FFT. The factors and trigonometric weights can be reused in subsequent transforms as long as N1 remains unchanged.

TABLE 5-2 lists the single precision linear FFT routines and their purposes. For routines that have two-dimensional arrays as input and output, TABLE 5-2 also lists the leading dimension requirements. The same information applies to the corresponding double precision routines except that their data types are double precision and double complex. See TABLE 5-2 for the mapping. See the individual man pages for a complete description of the routines and their arguments.


Name	Purpose	Size and Type of Input	Size and Type of Output	Leading Dimension Requirements
				In-place	Out-of-Place
`SFFTC`	`OPT` = 0 initialization
	`OPT` = -1 real-to-complex forward linear FFT of a single vector	N1, Real	$\frac{N1}{2}+1$ , Complex
`SFFTC`	`OPT` = 0 initialization
	`OPT` = 1 complex-to-real inverse linear FFT of single vector	$\frac{N1}{2}+1$ , Complex	N1 Real
`CFFTC`	`OPT` = 0 initialization
	`OPT` = -1 complex-to-complex forward linear FFT of a single vector	N1, Complex	N1, Complex
	`OPT` = 1 complex-to-complex inverse linear FFT of a single vector	N1, Complex	N1, Complex
`SFFTCM`	`OPT` = 0 initialization
	`OPT` = -1 real-to-complex forward linear FFT of M vectors	N1 × M, Real	$(\frac{N1}{2}+1)\times M$ , Complex	`LDX1` = 2 × `LDY1`	`LDX1` N1
`CFFTSM`	`OPT` = 0 initialization
	`OPT` = 1 complex-to-real inverse linear FFT of M vectors	$(\frac{N1}{2}+1)\times M$ , Complex	N1 × M, Real	`LDX1` $\frac{N1}{2}+1$ `LDY1`=2 × `LDX1`	`LDX1` $\frac{N1}{2}+1$ `LDY1` N1
`CFFTCM`	`OPT` = 0 initialization
	`OPT` = -1 complex-to-complex forward linear FFT of M vectors	N1 × M, Complex	N1 × M, Complex	`LDX1` N1 `LDY1` N1	`LDX1` N1 `LDY1` N1
	`OPT` = 1 complex-to-complex inverse linear FFT of M vectors	N1 × M, Complex	N1 × M, Complex	`LDX1` N1 `LDY1` N1	`LDX1` N1 `LDY1` N1

TABLE 5-2 Notes.

LDX1 is the leading dimension of the input array.

LDY1 is the leading dimension of the output array.

N1 is the first dimension of the FFT problem.

N2 is the second dimension of the FFT problem.

When calling routines with OPT = 0 to initialize the routine, the only error checking that is done is to determine if N1 < 0

CODE EXAMPLE 5-1 shows how to compute the linear real-to-complex and complex-to-real FFT of a set of sequences.

my_system% `cat testscm.f` PROGRAM TESTSCM IMPLICIT NONE INTEGER :: LW, IERR, I, J, K, LDX, LDC INTEGER,PARAMETER :: N1 = 3, N2 = 2, LDZ = N1, $ LDC = N1, LDX = 2LDC INTEGER, DIMENSION(:) :: IFAC(128) REAL :: SCALE REAL, PARAMETER :: ONE = 1.0 REAL, DIMENSION(:) :: SW(N1), TRIGS(2N1) REAL, DIMENSION(0:LDX-1,0:N2-1) :: X, V, Y COMPLEX, DIMENSION(0:LDZ-1, 0:N2-1) :: Z * workspace size LW = N1 SCALE = ONE/N1 WRITE(,) $ 'Linear complex-to-real and real-to-complex FFT of a sequence' WRITE(,) X = RESHAPE(SOURCE = (/.1, .2, .3,0.0,0.0,0.0,7.,8.,9., $ 0.0, 0.0, 0.0/), SHAPE=(/6,2/)) V = X WRITE(,) 'X = ' DO I = 0,N1-1 WRITE(,'(2(F4.1,2x))') (X(I,J), J = 0, N2-1) END DO WRITE(,) intialize trig table and compute factors of N1 CALL SFFTCM(0, N1, N2, ONE, X, LDX, Z, LDZ, TRIGS, IFAC, $ SW, LW, IERR) IF (IERR .NE. 0) THEN PRINT,'ROUTINE RETURN WITH ERROR CODE = ', IERR STOP END IF Compute out-of-place forward linear FFT. * Let FFT routine allocate memory. CALL SFFTCM(-1, N1, N2, ONE, X, LDX, Z, LDZ, TRIGS, IFAC, $ SW, 0, IERR) IF (IERR .NE. 0) THEN PRINT,'ROUTINE RETURN WITH ERROR CODE = ', IERR STOP END IF WRITE(,) 'out-of-place forward FFT of X:' WRITE(,)'Z =' DO I = 0, N1/2 WRITE(,'(2(A1, F4.1,A1,F4.1,A1,2x))') ('(',REAL(Z(I,J)), $ ',',AIMAG(Z(I,J)),')', J = 0, N2-1) END DO WRITE(,) * Compute in-place forward linear FFT. * X must be large enough to store N1/2+1 complex values CALL SFFTCM(-1, N1, N2, ONE, X, LDX, X, LDC, TRIGS, IFAC, $ SW, LW, IERR) IF (IERR .NE. 0) THEN PRINT,'ROUTINE RETURN WITH ERROR CODE = ', IERR STOP END IF WRITE(,) 'in-place forward FFT of X:' CALL PRINT_REAL_AS_COMPLEX(N1/2+1, N2, 1, X, LDC, N2) WRITE(,) Compute out-of-place inverse linear FFT. CALL CFFTSM(1, N1, N2, SCALE, Z, LDZ, X, LDX, TRIGS, IFAC, $ SW, LW, IERR) IF (IERR .NE. 0) THEN PRINT,'ROUTINE RETURN WITH ERROR CODE = ', IERR STOP END IF WRITE(,) 'out-of-place inverse FFT of Z:' DO I = 0, N1-1 WRITE(,'(2(F4.1,2X))') (X(I,J), J = 0, N2-1) END DO WRITE(,) * Compute in-place inverse linear FFT. CALL CFFTSM(1, N1, N2, SCALE, Z, LDZ, Z, LDZ2, TRIGS, $ IFAC, SW, 0, IERR) IF (IERR .NE. 0) THEN PRINT,'ROUTINE RETURN WITH ERROR CODE = ', IERR STOP END IF WRITE(,) 'in-place inverse FFT of Z:' CALL PRINT_COMPLEX_AS_REAL(N1, N2, 1, Z, LDZ2, N2) WRITE(,) END PROGRAM TESTSCM SUBROUTINE PRINT_COMPLEX_AS_REAL(N1, N2, N3, A, LD1, LD2) INTEGER N1, N2, N3, I, J, K REAL A(LD1, LD2, ) DO K = 1, N3 DO I = 1, N1 WRITE(,'(5(F4.1,2X))') (A(I,J,K), J = 1, N2) END DO WRITE(,) END DO END SUBROUTINE PRINT_REAL_AS_COMPLEX(N1, N2, N3, A, LD1, LD2) INTEGER N1, N2, N3, I, J, K COMPLEX A(LD1, LD2, ) DO K = 1, N3 DO I = 1, N1 WRITE(,'(5(A1, F4.1,A1,F4.1,A1,2X))') ('(',REAL(A(I,J,K)), $ ',',AIMAG(A(I,J,K)),')', J = 1, N2) END DO WRITE(,*) END DO END my_system% `f95 -dalign testscm.f -xlic_lib=sunperf` my_system% `a.out` Linear complex-to-real and real-to-complex FFT of a sequence X = 0.1 7.0 0.2 8.0 0.3 9.0 out-of-place forward FFT of X: Z = ( 0.6, 0.0) (24.0, 0.0) (-0.2, 0.1) (-1.5, 0.9) in-place forward FFT of X: ( 0.6, 0.0) (24.0, 0.0) (-0.2, 0.1) (-1.5, 0.9) out-of-place inverse FFT of Z: 0.1 7.0 0.2 8.0 0.3 9.0 in-place inverse FFT of Z: 0.1 7.0 0.2 8.0 0.3 9.0

CODE EXAMPLE 5-1 Linear Real-to-Complex FFT and Complex-to-Real FFT

my_system% cat testscm.f

       PROGRAM TESTSCM

       IMPLICIT NONE

       INTEGER :: LW, IERR, I, J, K, LDX, LDC

       INTEGER,PARAMETER :: N1 = 3, N2 = 2, LDZ = N1,

      $        LDC = N1, LDX = 2*LDC

       INTEGER, DIMENSION(:) :: IFAC(128)

       REAL :: SCALE

       REAL, PARAMETER :: ONE = 1.0

       REAL, DIMENSION(:) :: SW(N1), TRIGS(2*N1)

       REAL, DIMENSION(0:LDX-1,0:N2-1) :: X, V, Y

       COMPLEX, DIMENSION(0:LDZ-1, 0:N2-1) :: Z

* workspace size LW = N1 SCALE = ONE/N1

       WRITE(*,*) $ 'Linear complex-to-real and real-to-complex FFT of a sequence'

       WRITE(*,*)

       X = RESHAPE(SOURCE = (/.1, .2, .3,0.0,0.0,0.0,7.,8.,9.,

      $    0.0, 0.0, 0.0/), SHAPE=(/6,2/)) V = X

       WRITE(*,*) 'X = '

       DO I = 0,N1-1

         WRITE(*,'(2(F4.1,2x))') (X(I,J), J = 0, N2-1)

       END DO

       WRITE(*,*)

* intialize trig table and compute factors of N1

       CALL SFFTCM(0, N1, N2, ONE, X, LDX, Z, LDZ, TRIGS, IFAC,

      $ SW, LW, IERR)

       IF (IERR .NE. 0) THEN

          PRINT*,'ROUTINE RETURN WITH ERROR CODE = ', IERR

          STOP

       END IF

* Compute out-of-place forward linear FFT.

* Let FFT routine allocate memory.

       CALL SFFTCM(-1, N1, N2, ONE, X, LDX, Z, LDZ, TRIGS, IFAC,

      $            SW, 0, IERR)

      IF (IERR .NE. 0) THEN

        PRINT*,'ROUTINE RETURN WITH ERROR CODE = ', IERR

        STOP

      END IF

      WRITE(*,*) 'out-of-place forward FFT of X:'

      WRITE(*,*)'Z ='

      DO I = 0, N1/2

         WRITE(*,'(2(A1, F4.1,A1,F4.1,A1,2x))') ('(',REAL(Z(I,J)),

     $ ',',AIMAG(Z(I,J)),')', J = 0, N2-1)

      END DO

      WRITE(*,*)

* Compute in-place forward linear FFT.

* X must be large enough to store N1/2+1 complex values

      CALL SFFTCM(-1, N1, N2, ONE, X, LDX, X, LDC, TRIGS, IFAC,

     $            SW, LW, IERR)

      IF (IERR .NE. 0) THEN

         PRINT*,'ROUTINE RETURN WITH ERROR CODE = ', IERR

         STOP

      END IF

      WRITE(*,*) 'in-place forward FFT of X:'

      CALL PRINT_REAL_AS_COMPLEX(N1/2+1, N2, 1, X, LDC, N2)

      WRITE(*,*)

* Compute out-of-place inverse linear FFT.

      CALL CFFTSM(1, N1, N2, SCALE, Z, LDZ, X, LDX, TRIGS, IFAC,

     $            SW, LW, IERR)

      IF (IERR .NE. 0) THEN

         PRINT*,'ROUTINE RETURN WITH ERROR CODE = ', IERR

         STOP

      END IF

      WRITE(*,*) 'out-of-place inverse FFT of Z:'

      DO I = 0, N1-1

         WRITE(*,'(2(F4.1,2X))') (X(I,J), J = 0, N2-1)

      END DO

      WRITE(*,*)

* Compute in-place inverse linear FFT.

      CALL CFFTSM(1, N1, N2, SCALE, Z, LDZ, Z, LDZ*2, TRIGS,

     $            IFAC, SW, 0, IERR)

      IF (IERR .NE. 0) THEN

         PRINT*,'ROUTINE RETURN WITH ERROR CODE = ', IERR

         STOP

      END IF

      WRITE(*,*) 'in-place inverse FFT of Z:'

      CALL PRINT_COMPLEX_AS_REAL(N1, N2, 1, Z, LDZ*2, N2)

      WRITE(*,*)

      END PROGRAM TESTSCM

      SUBROUTINE PRINT_COMPLEX_AS_REAL(N1, N2, N3, A, LD1, LD2)

      INTEGER N1, N2, N3, I, J, K

      REAL A(LD1, LD2, *)

      DO K = 1, N3

         DO I = 1, N1

            WRITE(*,'(5(F4.1,2X))') (A(I,J,K), J = 1, N2)

         END DO

         WRITE(*,*)

      END DO

END

      SUBROUTINE PRINT_REAL_AS_COMPLEX(N1, N2, N3, A, LD1, LD2)

      INTEGER N1, N2, N3, I, J, K

      COMPLEX A(LD1, LD2, *)

      DO K = 1, N3

         DO I = 1, N1

            WRITE(*,'(5(A1, F4.1,A1,F4.1,A1,2X))') ('(',REAL(A(I,J,K)),

     $            ',',AIMAG(A(I,J,K)),')', J = 1, N2)

         END DO

         WRITE(*,*)

      END DO

END

my_system% f95 -dalign testscm.f -xlic_lib=sunperf

my_system% a.out

Linear complex-to-real and real-to-complex FFT of a sequence

X =

0.1 7.0

0.2 8.0

0.3 9.0

out-of-place forward FFT of X:

Z =

( 0.6, 0.0) (24.0, 0.0)

(-0.2, 0.1) (-1.5, 0.9)

in-place forward FFT of X:

( 0.6, 0.0) (24.0, 0.0)

(-0.2, 0.1) (-1.5, 0.9)

out-of-place inverse FFT of Z:

0.1 7.0

0.2 8.0

0.3 9.0

in-place inverse FFT of Z:

0.1 7.0

0.2 8.0

0.3 9.0

CODE EXAMPLE 5-1 Notes:

The forward FFT of X is actually

	(0.6, 0.0)	(24.0, 0.0)
Z =	(-0.2, 0.1)	(-1.5, 0.9)
	(-0.2, -0.1)	(-1.5, -0.9)

Because of symmetry, Z(2) is the complex conjugate of Z(1), and therefore only the first two $(\frac{N1}{2}+1=2)$ complex values are stored. For the in-place forward transform, SFFTCM is called with real array X as the input and output. Because SFFTCM expects the output array to be of type COMPLEX, the leading dimension of X as an output array must be as if X were complex. Since the leading dimension of real array X is LDX = 2 × LDC, the leading dimension of X as a complex output array must be LDC. Similarly, in the in-place inverse transform, CFFTSM is called with complex array Z as the input and output. Because CFFTSM expects the output array to be of type REAL, the leading dimension of Z as an output array must be as if Z were real. Since the leading dimension of complex array Z is LDZ, the leading dimension of Z as a real output array must be LDZ × 2.

CODE EXAMPLE 5-2 shows how to compute the linear complex-to-complex FFT of a set of sequences.

my_system% `cat testccm.f` PROGRAM TESTCCM IMPLICIT NONE INTEGER :: LDX1, LDY1, LW, IERR, I, J, K, LDZ1, NCPUS, $ USING_THREADS, IFAC(128) INTEGER, PARAMETER :: N1 = 3, N2 = 4, LDX1 = N1, LDZ1 = N1, $ LDY1 = N1+2 REAL, PARAMETER :: ONE = 1.0, SCALE = ONE/N1 COMPLEX :: Z(0:LDZ1-1,0:N2-1), X(0:LDX1-1,0:N2-1), $ Y(0:LDY1-1,0:N2-1) REAL :: TRIGS(2N1) REAL, DIMENSION(:), ALLOCATABLE :: SW get number of threads NCPUS = USING_THREADS() * workspace size LW = 2 * N1 * NCPUS WRITE(,)'Linear complex-to-complex FFT of one or more sequences' WRITE(,) ALLOCATE(SW(LW)) X = RESHAPE(SOURCE =(/(.1,.2),(.3,.4),(.5,.6),(.7,.8),(.9,1.0), $ (1.1,1.2),(1.3,1.4),(1.5,1.6),(1.7,1.8),(1.9,2.0),(2.1,2.2), $ (1.2,2.0)/), SHAPE=(/LDX1,N2/)) Z = X WRITE(,) 'X = ' DO I = 0, N1-1 WRITE(,'(5(A1, F5.1,A1,F5.1,A1,2X))') ('(',REAL(X(I,J)), $ ',',AIMAG(X(I,J)),')', J = 0, N2-1) END DO WRITE(,) intialize trig table and compute factors of N1 CALL CFFTCM(0, N1, N2, SCALE, X, LDX1, Y, LDY1, TRIGS, IFAC, $ SW, LW, IERR) IF (IERR .NE. 0) THEN PRINT,'ROUTINE RETURN WITH ERROR CODE = ', IERR STOP END IF Compute out-of-place forward linear FFT. * Let FFT routine allocate memory. CALL CFFTCM(-1, N1, N2, ONE, X, LDX1, Y, LDY1, TRIGS, IFAC, $ SW, 0, IERR) IF (IERR .NE. 0) THEN PRINT,'ROUTINE RETURN WITH ERROR CODE = ', IERR STOP END IF Compute in-place forward linear FFT. LDZ1 must equal LDX1 CALL CFFTCM(-1, N1, N2, ONE, Z, LDX1, Z, LDZ1, TRIGS, $ IFAC, SW, 0, IERR) WRITE(,) 'in-place forward FFT of X:' DO I = 0, N1-1 WRITE(,'(5(A1, F5.1,A1,F5.1,A1,2X))') ('(',REAL(Z(I,J)), $ ',',AIMAG(Z(I,J)),')', J = 0, N2-1) END DO WRITE(,) WRITE(,) 'out-of-place forward FFT of X:' WRITE(,) 'Y =' DO I = 0, N1-1 WRITE(,'(5(A1, F5.1,A1,F5.1,A1,2X))') ('(',REAL(Y(I,J)), $ ',',AIMAG(Y(I,J)),')', J = 0, N2-1) END DO WRITE(,) * Compute in-place inverse linear FFT. CALL CFFTCM(1, N1, N2, SCALE, Y, LDY1, Y, LDY1, TRIGS, IFAC, $ SW, LW, IERR) IF (IERR .NE. 0) THEN PRINT,'ROUTINE RETURN WITH ERROR CODE = ', IERR STOP END IF WRITE(,) 'in-place inverse FFT of Y:' WRITE(,) 'Y =' DO I = 0, N1-1 WRITE(,'(5(A1, F5.1,A1,F5.1,A1,2X))') ('(',REAL(Y(I,J)), $ ',',AIMAG(Y(I,J)),')', J = 0, N2-1) END DO DEALLOCATE(SW) END PROGRAM TESTCCM my_system% `f95 -dalign testccm.f -xlic_lib=sunperf` my_system% `a.out` Linear complex-to-complex FFT of one or more sequences X = ( 0.1, 0.2) ( 0.7, 0.8) ( 1.3, 1.4) ( 1.9, 2.0) ( 0.3, 0.4) ( 0.9, 1.0) ( 1.5, 1.6) ( 2.1, 2.2) ( 0.5, 0.6) ( 1.1, 1.2) ( 1.7, 1.8) ( 1.2, 2.0) in-place forward FFT of X: ( 0.9, 1.2) ( 2.7, 3.0) ( 4.5, 4.8) ( 5.2, 6.2) ( -0.5, -0.1) ( -0.5, -0.1) ( -0.5, -0.1) ( 0.4, -0.9) ( -0.1, -0.5) ( -0.1, -0.5) ( -0.1, -0.5) ( 0.1, 0.7) out-of-place forward FFT of X: Y = ( 0.9, 1.2) ( 2.7, 3.0) ( 4.5, 4.8) ( 5.2, 6.2) ( -0.5, -0.1) ( -0.5, -0.1) ( -0.5, -0.1) ( 0.4, -0.9) ( -0.1, -0.5) ( -0.1, -0.5) ( -0.1, -0.5) ( 0.1, 0.7) in-place inverse FFT of Y: Y = ( 0.1, 0.2) ( 0.7, 0.8) ( 1.3, 1.4) ( 1.9, 2.0) ( 0.3, 0.4) ( 0.9, 1.0) ( 1.5, 1.6) ( 2.1, 2.2) ( 0.5, 0.6) ( 1.1, 1.2) ( 1.7, 1.8) ( 1.2, 2.0)

CODE EXAMPLE 5-2 Linear Complex-to-Complex FFT

my_system% cat testccm.f

       PROGRAM TESTCCM

       IMPLICIT NONE

       INTEGER :: LDX1, LDY1, LW, IERR, I, J, K, LDZ1, NCPUS,

      $           USING_THREADS, IFAC(128)

       INTEGER, PARAMETER :: N1 = 3, N2 = 4, LDX1 = N1, LDZ1 = N1,

      $                      LDY1 = N1+2

       REAL, PARAMETER :: ONE = 1.0, SCALE = ONE/N1

       COMPLEX :: Z(0:LDZ1-1,0:N2-1), X(0:LDX1-1,0:N2-1),

      $           Y(0:LDY1-1,0:N2-1)

       REAL :: TRIGS(2*N1)

       REAL, DIMENSION(:), ALLOCATABLE :: SW

* get number of threads

       NCPUS = USING_THREADS()

* workspace size

       LW = 2 * N1 * NCPUS

       WRITE(*,*)'Linear complex-to-complex FFT of one or more sequences'

       WRITE(*,*)

       ALLOCATE(SW(LW))

       X = RESHAPE(SOURCE =(/(.1,.2),(.3,.4),(.5,.6),(.7,.8),(.9,1.0),

      $ (1.1,1.2),(1.3,1.4),(1.5,1.6),(1.7,1.8),(1.9,2.0),(2.1,2.2),

      $ (1.2,2.0)/), SHAPE=(/LDX1,N2/))

       Z = X

       WRITE(*,*) 'X = '

       DO I = 0, N1-1

          WRITE(*,'(5(A1, F5.1,A1,F5.1,A1,2X))') ('(',REAL(X(I,J)),

      $           ',',AIMAG(X(I,J)),')', J = 0, N2-1)

       END DO WRITE(*,*)

* intialize trig table and compute factors of N1

       CALL CFFTCM(0, N1, N2, SCALE, X, LDX1, Y, LDY1, TRIGS, IFAC,

      $            SW, LW, IERR)

       IF (IERR .NE. 0) THEN

         PRINT*,'ROUTINE RETURN WITH ERROR CODE = ', IERR

         STOP

       END IF

* Compute out-of-place forward linear FFT.

* Let FFT routine allocate memory.

       CALL CFFTCM(-1, N1, N2, ONE, X, LDX1, Y, LDY1, TRIGS, IFAC,

      $            SW, 0, IERR)

       IF (IERR .NE. 0) THEN

          PRINT*,'ROUTINE RETURN WITH ERROR CODE = ', IERR

          STOP

       END IF

* Compute in-place forward linear FFT. LDZ1 must equal LDX1

       CALL CFFTCM(-1, N1, N2, ONE, Z, LDX1, Z, LDZ1, TRIGS,

      $            IFAC, SW, 0, IERR)

       WRITE(*,*) 'in-place forward FFT of X:'

       DO I = 0, N1-1

          WRITE(*,'(5(A1, F5.1,A1,F5.1,A1,2X))') ('(',REAL(Z(I,J)),

      $           ',',AIMAG(Z(I,J)),')', J = 0, N2-1)

       END DO

       WRITE(*,*)

       WRITE(*,*) 'out-of-place forward FFT of X:'

       WRITE(*,*) 'Y ='

       DO I = 0, N1-1

       WRITE(*,'(5(A1, F5.1,A1,F5.1,A1,2X))') ('(',REAL(Y(I,J)),

      $        ',',AIMAG(Y(I,J)),')', J = 0, N2-1)

       END DO

       WRITE(*,*)

* Compute in-place inverse linear FFT.

       CALL CFFTCM(1, N1, N2, SCALE, Y, LDY1, Y, LDY1, TRIGS, IFAC,

      $            SW, LW, IERR)

       IF (IERR .NE. 0) THEN

          PRINT*,'ROUTINE RETURN WITH ERROR CODE = ', IERR

          STOP

       END IF

       WRITE(*,*) 'in-place inverse FFT of Y:'

       WRITE(*,*) 'Y ='

       DO I = 0, N1-1

          WRITE(*,'(5(A1, F5.1,A1,F5.1,A1,2X))') ('(',REAL(Y(I,J)),

      $           ',',AIMAG(Y(I,J)),')', J = 0, N2-1)

       END DO

       DEALLOCATE(SW)

       END PROGRAM TESTCCM

my_system% f95 -dalign testccm.f -xlic_lib=sunperf

my_system% a.out

Linear complex-to-complex FFT of one or more sequences

X =

( 0.1, 0.2) ( 0.7, 0.8) ( 1.3, 1.4) ( 1.9, 2.0)

( 0.3, 0.4) ( 0.9, 1.0) ( 1.5, 1.6) ( 2.1, 2.2)

( 0.5, 0.6) ( 1.1, 1.2) ( 1.7, 1.8) ( 1.2, 2.0)

in-place forward FFT of X:

(  0.9,  1.2) (  2.7,  3.0) (  4.5,  4.8) ( 5.2,  6.2)

( -0.5, -0.1) ( -0.5, -0.1) ( -0.5, -0.1) ( 0.4, -0.9)

( -0.1, -0.5) ( -0.1, -0.5) ( -0.1, -0.5) ( 0.1,  0.7)

out-of-place forward FFT of X:

Y =

(  0.9,  1.2) (  2.7,  3.0) (  4.5,  4.8) ( 5.2,  6.2)

( -0.5, -0.1) ( -0.5, -0.1) ( -0.5, -0.1) ( 0.4, -0.9)

( -0.1, -0.5) ( -0.1, -0.5) ( -0.1, -0.5) ( 0.1,  0.7)

in-place inverse FFT of Y:

Y =

( 0.1, 0.2) ( 0.7, 0.8) ( 1.3, 1.4) ( 1.9, 2.0)

( 0.3, 0.4) ( 0.9, 1.0) ( 1.5, 1.6) ( 2.1, 2.2)

( 0.5, 0.6) ( 1.1, 1.2) ( 1.7, 1.8) ( 1.2, 2.0)

Two-Dimensional FFT Routines

For the linear FFT routines, when the input is a two-dimensional array, the FFT is computed along one dimension only, namely, along the columns of the array. The two-dimensional FFT routines take a two-dimensional array as input and compute the FFT along both the column and row dimensions. Specifically, the forward two-dimensional FFT routines compute

$X(k,n)=\sum _{l=0}^{N2-1}\, \sum _{j=0}^{N1-1}x(j,l)e^{\frac{-2\pi iln}{N2}}e^{\frac{-2\pi ijk}{N1}},\, \, k=0,\ldots N1-1,\, \, \, n=0,\ldots N2-1$ ,

and the inverse two-dimensional FFT routines compute

$x(j,l)=\sum _{n=0}^{N2-1}\, \sum _{k=0}^{N1-1}X(k,n)e^{\frac{2\pi iln}{N2}}e^{\frac{2\pi ijk}{N1}},\, \, j=0,\ldots N1-1,\, \, \, l=0,\ldots N2-1$ .

For both the forward and inverse two-dimensional transforms, a complex-to-complex transform where the input problem is N1 × N2 will yield a complex array that is also N1 × N2.

When computing a real-to-complex two-dimensional transform (forward FFT), if the real input array is of dimensions N1 × N2, the result will be a complex array of dimensions $(\frac{N1}{2}+1)\times N2(\frac{N1}{2}+1)\times N2(\frac{N1}{2}+1)X(\frac{N1}{2}+1:N1-1, [:)]X(0:\frac{N1}{2}, [:)]$ . Conversely, when computing a complex-to-real transform (inverse FFT) of dimensions N1 × N2, an $(\frac{N1}{2}+1)\times N2(\frac{N1}{2}+1)\times N2(\frac{N1}{2}+1)X(\frac{N1}{2}+1:N1-1, [:)]X(0:\frac{N1}{2}, [:)]$ complex array is required as input. As with the real-to-complex and complex-to-real linear FFT, because of conjugate symmetry, only the first $(\frac{N1}{2}+1)\times N2(\frac{N1}{2}+1)\times N2(\frac{N1}{2}+1)X(\frac{N1}{2}+1:N1-1, [:)]X(0:\frac{N1}{2}, [:)]$ complex data points need to be stored in the input or output array along the first dimension. The complex subarray $(\frac{N1}{2}+1)\times N2(\frac{N1}{2}+1)\times N2(\frac{N1}{2}+1)X(\frac{N1}{2}+1:N1-1, [:)]X(0:\frac{N1}{2}, [:)]$ can be obtained from $(\frac{N1}{2}+1)\times N2(\frac{N1}{2}+1)\times N2(\frac{N1}{2}+1)X(\frac{N1}{2}+1:N1-1, [:)]X(0:\frac{N1}{2}, [:)]$ as follows:

$X(k,n) & = & X^{\star }(N1-k,n)\\ k & = & \frac{N1}{2}+1,\ldots N1-1\\ n & = & 0,\ldots N2-1$

To compute a two-dimensional transform, an FFT routine must be called twice. One call initializes the routine and the second call actually computes the transform. The initialization includes computing the factors of N1 and N2 and the trigonometric weights associated with those factors. In subsequent forward or inverse transforms, initialization is not necessary as long as N1 and N2 remain unchanged.

IMPORTANT: Upon returning from a two-dimensional FFT routine, Y(0 : N - 1, :) contains the transform results and the original contents of Y(N : LDY-1, :) is overwritten. Here, N = N1 in the complex-to-complex and complex-to-real transforms and N = $(\frac{N1}{2}+1)$ in the real-to-complex transform.

TABLE 5-3 lists the single precision two-dimensional FFT routines and their purposes. The same information applies to the corresponding double precision routines except that their data types are double precision and double complex. See TABLE 5-3 for the mapping. Refer to the individual man pages for a complete description of the routines and their arguments.


Name	Purpose	Size, Type of Input	Size, Type of Output	Leading Dimension Requirements
				In-place	Out-of-Place
`SFFTC2`	`OPT` = 0 initialization
	`OPT` = -1 real-to-complex forward two-dimensional FFT	N1 × N2, Real	$(\frac{N1}{2}+1)\times N2$ , Complex	`LDX1` = 2 × `LDY1` `LDY1` $\frac{N1}{2}+1$	`LDX1` N1 `LDY1` $\frac{N1}{2}+1$
`CFFTS2`	`OPT` = 0 initialization
	`OPT` = 1 complex-to-real inverse two-dimensional FFT	$(\frac{N1}{2}+1)\times N2$ , Complex	N1 × N2, Real	`LDX1` $\frac{N1}{2}+1$ `LDY1`=2 × `LDX1`	`LDX1` $\frac{N1}{2}+1$ `LDY1` 2 × `LDX1` `LDY1` is even
`CFFTC2`	`OPT` = 0 initialization
	`OPT` = -1 complex-to-complex forward two-dimensional FFT	N1 × N2, Complex	N1 × N2, Complex	`LDX1` N1 `LDY1` = `LDX1`	`LDX1` N1 `LDY1` N1
	`OPT` = 1 complex-to-complex inverse two-dimensional FFT	N1 × N2, Complex	N1 × N2, Complex	`LDX1` N1 `LDY1` = `LDX1`	`LDX1` N1 `LDY1` = `LDX1`

TABLE 5-3 Notes:

LDX1 is leading dimension of input array.

LDY1 is leading dimension of output array.

N1 is first dimension of the FFT problem.

N2 is second dimension of the FFT problem.

When calling routines with OPT = 0 to initialize the routine, the only error checking that is done is to determine if N1, N2 < 0.

The following example shows how to compute a two-dimensional real-to-complex FFT and complex-to-real FFT of a two-dimensional array.

my_system% `cat testsc2.f` PROGRAM TESTSC2 IMPLICIT NONE INTEGER, PARAMETER :: N1 = 3, N2 = 4, LDX1 = N1, $ LDY1 = N1/2+1, LDR1 = 2(N1/2+1) INTEGER LW, IERR, I, J, K, IFAC(1282) REAL, PARAMETER :: ONE = 1.0, SCALE = ONE/(N1N2) REAL :: V(LDR1,N2), X(LDX1, N2), Z(LDR1,N2), $ SW(2N2), TRIGS(2(N1+N2)) COMPLEX :: Y(LDY1,N2) WRITE(,) $'Two-dimensional complex-to-real and real-to-complex FFT' WRITE(,) X = RESHAPE(SOURCE = (/.1, .2, .3, .4, .5, .6, .7, .8, $ 2.0,1.0, 1.1, 1.2/), SHAPE=(/LDX1,N2/)) DO I = 1, N2 V(1:N1,I) = X(1:N1,I) END DO WRITE(,) 'X =' DO I = 1, N1 WRITE(,'(5(F5.1,2X))') (X(I,J), J = 1, N2) END DO WRITE(,) * Initialize trig table and get factors of N1, N2 CALL SFFTC2(0,N1,N2,ONE,X,LDX1,Y,LDY1,TRIGS, $ IFAC,SW,0,IERR) * Compute 2-dimensional out-of-place forward FFT. * Let FFT routine allocate memory. * cannot do an in-place transform in X because LDX1 < 2(N1/2+1) CALL SFFTC2(-1,N1,N2,ONE,X,LDX1,Y,LDY1,TRIGS, $ IFAC,SW,0,IERR) WRITE(,) 'out-of-place forward FFT of X:' WRITE(,)'Y =' DO I = 1, N1/2+1 WRITE(,'(5(A1, F5.1,A1,F5.1,A1,2X))')('(',REAL(Y(I,J)), $ ',',AIMAG(Y(I,J)),')', J = 1, N2) END DO WRITE(,) * Compute 2-dimensional in-place forward FFT. * Use workspace already allocated. * V which is real array containing input data is also * used to store complex results; as a complex array, its first * leading dimension is LDR1/2. CALL SFFTC2(-1,N1,N2,ONE,V,LDR1,V,LDR1/2,TRIGS, $ IFAC,SW,LW,IERR) WRITE(,) 'in-place forward FFT of X:' CALL PRINT_REAL_AS_COMPLEX(N1/2+1, N2, 1, V, LDR1/2, N2) * Compute 2-dimensional out-of-place inverse FFT. * Leading dimension of Z must be even CALL CFFTS2(1,N1,N2,SCALE,Y,LDY1,Z,LDR1,TRIGS, $ IFAC,SW,0,IERR) WRITE(,) 'out-of-place inverse FFT of Y:' DO I = 1, N1 WRITE(,'(5(F5.1,2X))') (Z(I,J), J = 1, N2) END DO WRITE(,) Compute 2-dimensional in-place inverse FFT. * Y which is complex array containing input data is also * used to store real results; as a real array, its first * leading dimension is 2LDY1. CALL CFFTS2(1,N1,N2,SCALE,Y,LDY1,Y,2LDY1, $ TRIGS,IFAC,SW,0,IERR) WRITE(,) 'in-place inverse FFT of Y:' CALL PRINT_COMPLEX_AS_REAL(N1, N2, 1, Y, 2LDY1, N2) END PROGRAM TESTSC2 SUBROUTINE PRINT_COMPLEX_AS_REAL(N1, N2, N3, A, LD1, LD2) INTEGER N1, N2, N3, I, J, K REAL A(LD1, LD2, ) DO K = 1, N3 DO I = 1, N1 WRITE(,'(5(F5.1,2X))') (A(I,J,K), J = 1, N2) END DO WRITE(,) END DO END SUBROUTINE PRINT_REAL_AS_COMPLEX(N1, N2, N3, A, LD1, LD2) INTEGER N1, N2, N3, I, J, K COMPLEX A(LD1, LD2, ) DO K = 1, N3 DO I = 1, N1 WRITE(,'(5(A1, F5.1,A1,F5.1,A1,2X))') ('(',REAL(A(I,J,K)), $ ',',AIMAG(A(I,J,K)),')', J = 1, N2) END DO WRITE(,*) END DO END my_system% `f95 -dalign testsc2.f -xlic_lib=sunperf` my_system% `a.out` Two-dimensional complex-to-real and real-to-complex FFT x = 0.1 0.4 0.7 1.0 0.2 0.5 0.8 1.1 0.3 0.6 2.0 1.2 out-of-place forward FFT of X: Y = ( 8.9, 0.0) ( -2.9, 1.8) ( -0.7, 0.0) ( -2.9, -1.8) ( -1.2, 1.3) ( 0.5, -1.0) ( -0.5, 1.0) ( 0.5, -1.0) in-place forward FFT of X: ( 8.9, 0.0) ( -2.9, 1.8) ( -0.7, 0.0) ( -2.9, -1.8) ( -1.2, 1.3) ( 0.5, -1.0) ( -0.5, 1.0) ( 0.5, -1.0) out-of-place inverse FFT of Y: 0.1 0.4 0.7 1.0 0.2 0.5 0.8 1.1 0.3 0.6 2.0 1.2 in-place inverse FFT of Y: 0.1 0.4 0.7 1.0 0.2 0.5 0.8 1.1 0.3 0.6 2.0 1.2

CODE EXAMPLE 5-3 Two-Dimensional Real-to-Complex FFT and Complex-to-Real FFT of a Two-Dimensional Array

my_system% cat testsc2.f

       PROGRAM TESTSC2

       IMPLICIT NONE

       INTEGER, PARAMETER :: N1 = 3, N2 = 4, LDX1 = N1,

      $         LDY1 = N1/2+1, LDR1 = 2*(N1/2+1)

       INTEGER LW, IERR, I, J, K, IFAC(128*2)

       REAL, PARAMETER :: ONE = 1.0, SCALE = ONE/(N1*N2)

       REAL :: V(LDR1,N2), X(LDX1, N2), Z(LDR1,N2),

      $        SW(2*N2), TRIGS(2*(N1+N2))

       COMPLEX :: Y(LDY1,N2)

       WRITE(*,*) $'Two-dimensional complex-to-real and real-to-complex FFT'

       WRITE(*,*)

       X = RESHAPE(SOURCE = (/.1, .2, .3, .4, .5, .6, .7, .8,

      $            2.0,1.0, 1.1, 1.2/), SHAPE=(/LDX1,N2/))

       DO I = 1, N2

          V(1:N1,I) = X(1:N1,I)

       END DO

       WRITE(*,*) 'X ='

       DO I = 1, N1

          WRITE(*,'(5(F5.1,2X))') (X(I,J), J = 1, N2)

       END DO

       WRITE(*,*)

* Initialize trig table and get factors of N1, N2

       CALL SFFTC2(0,N1,N2,ONE,X,LDX1,Y,LDY1,TRIGS,

      $            IFAC,SW,0,IERR)

* Compute 2-dimensional out-of-place forward FFT.

* Let FFT routine allocate memory.

* cannot do an in-place transform in X because LDX1 < 2*(N1/2+1)

       CALL SFFTC2(-1,N1,N2,ONE,X,LDX1,Y,LDY1,TRIGS,

      $            IFAC,SW,0,IERR)

       WRITE(*,*) 'out-of-place forward FFT of X:'

       WRITE(*,*)'Y ='

       DO I = 1, N1/2+1

          WRITE(*,'(5(A1, F5.1,A1,F5.1,A1,2X))')('(',REAL(Y(I,J)),

      $           ',',AIMAG(Y(I,J)),')', J = 1, N2)

       END DO

       WRITE(*,*)

* Compute 2-dimensional in-place forward FFT.

* Use workspace already allocated.

* V which is real array containing input data is also

* used to store complex results; as a complex array, its first

* leading dimension is LDR1/2.

       CALL SFFTC2(-1,N1,N2,ONE,V,LDR1,V,LDR1/2,TRIGS,

      $            IFAC,SW,LW,IERR)

       WRITE(*,*) 'in-place forward FFT of X:'

       CALL PRINT_REAL_AS_COMPLEX(N1/2+1, N2, 1, V, LDR1/2, N2)

* Compute 2-dimensional out-of-place inverse FFT.

* Leading dimension of Z must be even

       CALL CFFTS2(1,N1,N2,SCALE,Y,LDY1,Z,LDR1,TRIGS,

      $            IFAC,SW,0,IERR)

       WRITE(*,*) 'out-of-place inverse FFT of Y:'

       DO I = 1, N1

          WRITE(*,'(5(F5.1,2X))') (Z(I,J), J = 1, N2)

       END DO

       WRITE(*,*)

* Compute 2-dimensional in-place inverse FFT.

* Y which is complex array containing input data is also

* used to store real results; as a real array, its first

* leading dimension is 2*LDY1.

       CALL CFFTS2(1,N1,N2,SCALE,Y,LDY1,Y,2*LDY1,

      $            TRIGS,IFAC,SW,0,IERR)

       WRITE(*,*) 'in-place inverse FFT of Y:'

       CALL PRINT_COMPLEX_AS_REAL(N1, N2, 1, Y, 2*LDY1, N2)

       END PROGRAM TESTSC2

       SUBROUTINE PRINT_COMPLEX_AS_REAL(N1, N2, N3, A, LD1, LD2)

       INTEGER N1, N2, N3, I, J, K

       REAL A(LD1, LD2, *)

       DO K = 1, N3

          DO I = 1, N1

             WRITE(*,'(5(F5.1,2X))') (A(I,J,K), J = 1, N2)

          END DO

          WRITE(*,*)

       END DO

END

       SUBROUTINE PRINT_REAL_AS_COMPLEX(N1, N2, N3, A, LD1, LD2)

       INTEGER N1, N2, N3, I, J, K

       COMPLEX A(LD1, LD2, *)

       DO K = 1, N3

          DO I = 1, N1

             WRITE(*,'(5(A1, F5.1,A1,F5.1,A1,2X))') ('(',REAL(A(I,J,K)),

      $              ',',AIMAG(A(I,J,K)),')', J = 1, N2)

          END DO

          WRITE(*,*)

       END DO

END

my_system% f95 -dalign testsc2.f -xlic_lib=sunperf

my_system% a.out

Two-dimensional complex-to-real and real-to-complex FFT

x =

0.1 0.4 0.7 1.0

0.2 0.5 0.8 1.1

0.3 0.6 2.0 1.2

out-of-place forward FFT of X:

Y =

(  8.9, 0.0) ( -2.9,  1.8) ( -0.7, 0.0) ( -2.9, -1.8)

( -1.2, 1.3) (  0.5, -1.0) ( -0.5, 1.0) (  0.5, -1.0)

in-place forward FFT of X:

( 8.9, 0.0) ( -2.9, 1.8) ( -0.7, 0.0) ( -2.9, -1.8)

( -1.2, 1.3) ( 0.5, -1.0) ( -0.5, 1.0) ( 0.5, -1.0)

out-of-place inverse FFT of Y:

0.1 0.4 0.7 1.0

0.2 0.5 0.8 1.1

0.3 0.6 2.0 1.2

in-place inverse FFT of Y:

0.1 0.4 0.7 1.0

0.2 0.5 0.8 1.1

0.3 0.6 2.0 1.2

Three-Dimensional FFT Routines

Sun Performance Library includes routines that compute three-dimensional FFT. In this case, the FFT is computed along all three dimensions of a three-dimensional array. The forward FFT computes

$X(k,n,m)=\sum _{h=0}^{N3-1}\, \sum _{l=0}^{N2-1}\, \sum _{j=0}^{N1-1}x(j,l,h)e^{\frac{-2\pi ihm}{N3}}e^{\frac{-2\pi iln}{N2}}e^{\frac{-2\pi ijk}{N1}},$ ,

k = 0, ..., N1 - 1

n = 0, ..., N2 - 1

m = 0, ..., N3 - 1

and the inverse FFT computes

$x(j,l,h)=\sum _{m=0}^{N3-1}\, \sum _{n=0}^{N2-1}\, \sum _{k=0}^{N1-1}x(k,n,m)e^{\frac{2\pi ihm}{N3}}e^{\frac{2\pi iln}{N2}}e^{\frac{2\pi ijk}{N1}},$ ,

j = 0, ..., N1 - 1

l = 0, ..., N2 - 1

h = 0, ..., N3 - 1

In the complex-to-complex transform, if the input problem is N1 × N2 × N3, a three-dimensional transform will yield a complex array that is also N1 × N2 × N3. When computing a real-to-complex three-dimensional transform, if the real input array is of dimensions N1 × N2 × N3, the result will be a complex array of dimensions $(\frac{N1}{2}+1)\times N2\times N3(\frac{N1}{2}+1)\times N2\times N3(\frac{N1}{2}+1) X(\frac{N1}{2}+1:N1-1,:, [:)]X(0:\frac{N1}{2},:, [:)]$ . Conversely, when computing a complex-to-real FFT of dimensions N1 × N2 × N3, an $(\frac{N1}{2}+1)\times N2\times N3(\frac{N1}{2}+1)\times N2\times N3(\frac{N1}{2}+1) X(\frac{N1}{2}+1:N1-1,:, [:)]X(0:\frac{N1}{2},:, [:)]$ complex array is required as input. As with the real-to-complex and complex-to-real linear FFT, because of conjugate symmetry, only the first $(\frac{N1}{2}+1)\times N2\times N3(\frac{N1}{2}+1)\times N2\times N3(\frac{N1}{2}+1) X(\frac{N1}{2}+1:N1-1,:, [:)]X(0:\frac{N1}{2},:, [:)]$ complex data points need to be stored along the first dimension. The complex subarray $(\frac{N1}{2}+1)\times N2\times N3(\frac{N1}{2}+1)\times N2\times N3(\frac{N1}{2}+1) X(\frac{N1}{2}+1:N1-1,:, [:)]X(0:\frac{N1}{2},:, [:)]$ can be obtained from $(\frac{N1}{2}+1)\times N2\times N3(\frac{N1}{2}+1)\times N2\times N3(\frac{N1}{2}+1) X(\frac{N1}{2}+1:N1-1,:, [:)]X(0:\frac{N1}{2},:, [:)]$ as follows:

$X(k,n,m)=X^{\star }(N1-k,n,m),$

To compute a three-dimensional transform, an FFT routine must be called twice: Once to initialize and once more to actually compute the transform. The initialization includes computing the factors of N1, N2, and N3 and the trigonometric weights associated with those factors. In subsequent forward or inverse transforms, initialization is not necessary as long as N1, N2, and N3 remain unchanged.

IMPORTANT: Upon returning from a three-dimensional FFT routine, Y(0 : N - 1, :, :) contains the transform results and the original contents of Y(N:LDY1-1, :, :) is overwritten. Here, N = N1 in the complex-to-complex and complex-to-real transforms and N = $\frac{N1}{2}+1$ in the real-to-complex transform.

TABLE 5-4 lists the single precision three-dimensional FFT routines and their purposes. The same information applies to the corresponding double precision routines except that their data types are double precision and double complex. See TABLE 5-4 for the mapping. See the individual man pages for a complete description of the routines and their arguments.


Name	Purpose	Size, Type of Input	Size, Type of Output	Leading Dimension Requirements
				In-place	Out-of-Place
`SFFTC3`	`OPT` = 0 initialization
	`OPT` = -1 real-to-complex forward three-dimensional FFT	N1 × N2 × N3, Real	$(\frac{N1}{2}+1)\times N2\times N3$ , Complex	`LDX1`=2 × `LDY1` `LDX2` N2 `LDY1` $frac{N1}{2}+1$ `LDY2` = `LDX2`	`LDX1` N1 `LDX2` N2 `LDY1` $frac{N1}{2}+1$ `LDY2` N2
`CFFTS3`	`OPT` = 0 initialization
	`OPT` = 1 complex-to-real inverse three-dimensional FFT	$(\frac{N1}{2}+1)\times N2\times N3$ , Complex	N1 × N2 × N3, Real	`LDX1` $frac{N1}{2}+1$ `LDX2` N2 `LDY1`=2 × `LDX1` `LDY2`=`LDX2`	`LDX1` $frac{N1}{2}+1$ `LDX2` N2 `LDY1` 2 × `LDX1` `LDY1` is even `LDY2` N2
`CFFTC3`	`OPT` = 0 initialization
	`OPT` = -1 complex-to-complex forward three-dimensional FFT	N1 × N2 × N3, Complex	N1 × N2 × N3, Complex	`LDX1` N1 `LDX2` N2 `LDY1`=`LDX1` `LDY2`=`LDX2`	`LDX1` N1 `LDX2` N2 `LDY1` N1 `LDY2` N2
	`OPT` = 1 complex-to-complex inverse three-dimensional FFT	N1 × N2 × N3, Complex	N1 × N2 × N3, Complex	`LDX1` N1 `LDX2` N2 `LDY1`=`LDX1` `LDY2`=`LDX2`	`LDX1` N1 `LDX2` N2 `LDY1` N1 `LDY2` N2

TABLE 5-4 Notes:

LDX1 is first leading dimension of input array.

LDX2 is the second leading dimension of the input array.

LDY1 is the first leading dimension of the output array.

LDY2 is the second leading dimension of the output array.

N1 is the first dimension of the FFT problem.

N2 is the second dimension of the FFT problem.

N3 is the third dimension of the FFT problem.

When calling routines with OPT = 0 to initialize the routine, the only error checking that is done is to determine if N1, N2, N3 < 0.

CODE EXAMPLE 5-4 shows how to compute the three-dimensional real-to-complex FFT and complex-to-real FFT of a three-dimensional array.

my_system% `cat testsc3.f` PROGRAM TESTSC3 IMPLICIT NONE INTEGER LW, NCPUS, IERR, I, J, K, USING_THREADS, IFAC(1283) INTEGER, PARAMETER :: N1 = 3, N2 = 4, N3 = 2, LDX1 = N1, $ LDX2 = N2, LDY1 = N1/2+1, LDY2 = N2, $ LDR1 = 2(N1/2+1), LDR2 = N2 REAL, PARAMETER :: ONE = 1.0, SCALE = ONE/(N1N2N3) REAL :: V(LDR1,LDR2,N3), X(LDX1,LDX2,N3), Z(LDR1,LDR2,N3), $ TRIGS(2(N1+N2+N3)) REAL, DIMENSION(:), ALLOCATABLE :: SW COMPLEX :: Y(LDY1,LDY2,N3) WRITE(,) $'Three-dimensional complex-to-real and real-to-complex FFT' WRITE(,) get number of threads NCPUS = USING_THREADS() * compute workspace size required LW = (MAX(MAX(N1,2N2),2N3) + 16N3) NCPUS ALLOCATE(SW(LW)) X = RESHAPE(SOURCE = $ (/ .1, .2, .3, .4, .5, .6, .7, .8, .9,1.0,1.1,1.2, $ 4.1,1.2,2.3,3.4,6.5,1.6,2.7,4.8,7.9,1.0,3.1,2.2/), $ SHAPE=(/LDX1,LDX2,N3/)) V = RESHAPE(SOURCE = $ (/.1,.2,.3,0.,.4,.5,.6,0.,.7,.8,.9,0.,1.0,1.1,1.2,0., $ 4.1,1.2,2.3,0.,3.4,6.5,1.6,0.,2.7,4.8,7.9,0., $ 1.0,3.1,2.2,0./), SHAPE=(/LDR1,LDR2,N3/)) WRITE(,) 'X =' DO K = 1, N3 DO I = 1, N1 WRITE(,'(5(F5.1,2X))') (X(I,J,K), J = 1, N2) END DO WRITE(,) END DO Initialize trig table and get factors of N1, N2 and N3 CALL SFFTC3(0,N1,N2,N3,ONE,X,LDX1,LDX2,Y,LDY1,LDY2,TRIGS, $ IFAC,SW,0,IERR) * Compute 3-dimensional out-of-place forward FFT. * Let FFT routine allocate memory. * cannot do an in-place transform because LDX1 < 2(N1/2+1) CALL SFFTC3(-1,N1,N2,N3,ONE,X,LDX1,LDX2,Y,LDY1,LDY2,TRIGS, $ IFAC,SW,0,IERR) WRITE(,) 'out-of-place forward FFT of X:' WRITE(,)'Y =' DO K = 1, N3 DO I = 1, N1/2+1 WRITE(,'(5(A1, F5.1,A1,F5.1,A1,2X))')('(',REAL(Y(I,J,K)), $ ',',AIMAG(Y(I,J,K)),')', J = 1, N2) END DO WRITE(,) END DO * Compute 3-dimensional in-place forward FFT. * Use workspace already allocated. * V which is real array containing input data is also * used to store complex results; as a complex array, its first * leading dimension is LDR1/2. CALL SFFTC3(-1,N1,N2,N3,ONE,V,LDR1,LDR2,V,LDR1/2,LDR2,TRIGS, $ IFAC,SW,LW,IERR) WRITE(,) 'in-place forward FFT of X:' CALL PRINT_REAL_AS_COMPLEX(N1/2+1, N2, N3, V, LDR1/2, LDR2) * Compute 3-dimensional out-of-place inverse FFT. * First leading dimension of Z (LDR1) must be even CALL CFFTS3(1,N1,N2,N3,SCALE,Y,LDY1,LDY2,Z,LDR1,LDR2,TRIGS, $ IFAC,SW,0,IERR) WRITE(,) 'out-of-place inverse FFT of Y:' DO K = 1, N3 DO I = 1, N1 WRITE(,'(5(F5.1,2X))') (Z(I,J,K), J = 1, N2) END DO WRITE(,) END DO Compute 3-dimensional in-place inverse FFT. * Y which is complex array containing input data is also * used to store real results; as a real array, its first * leading dimension is 2LDY1. CALL CFFTS3(1,N1,N2,N3,SCALE,Y,LDY1,LDY2,Y,2LDY1,LDY2, $ TRIGS,IFAC,SW,LW,IERR) WRITE(,) 'in-place inverse FFT of Y:' CALL PRINT_COMPLEX_AS_REAL(N1, N2, N3, Y, 2LDY1, LDY2) DEALLOCATE(SW) END PROGRAM TESTSC3 SUBROUTINE PRINT_COMPLEX_AS_REAL(N1, N2, N3, A, LD1, LD2) INTEGER N1, N2, N3, I, J, K REAL A(LD1, LD2, ) DO K = 1, N3 DO I = 1, N1 WRITE(,'(5(F5.1,2X))') (A(I,J,K), J = 1, N2) END DO WRITE(,) END DO END SUBROUTINE PRINT_REAL_AS_COMPLEX(N1, N2, N3, A, LD1, LD2) INTEGER N1, N2, N3, I, J, K COMPLEX A(LD1, LD2, ) DO K = 1, N3 DO I = 1, N1 WRITE(,'(5(A1, F5.1,A1,F5.1,A1,2X))') ('(',REAL(A(I,J,K)), $ ',',AIMAG(A(I,J,K)),')', J = 1, N2) END DO WRITE(,*) END DO END my_system% `f95 -dalign testsc3.f -xlic_lib=sunperf` my_system% `a.out` Three-dimensional complex-to-real and real-to-complex FFT X = 0.1 0.4 0.7 1.0 0.2 0.5 0.8 1.1 0.3 0.6 0.9 1.2 4.1 3.4 2.7 1.0 1.2 6.5 4.8 3.1 2.3 1.6 7.9 2.2 out-of-place forward FFT of X: Y = ( 48.6, 0.0) ( -9.6, -3.4) ( 3.4, 0.0) ( -9.6, 3.4) ( -4.2, -1.0) ( 2.5, -2.7) ( 1.0, 8.7) ( 9.5, -0.7) (-33.0, 0.0) ( 6.0, 7.0) ( -7.0, 0.0) ( 6.0, -7.0) ( 3.0, 1.7) ( -2.5, 2.7) ( -1.0, -8.7) ( -9.5, 0.7) in-place forward FFT of X: ( 48.6, 0.0) ( -9.6, -3.4) ( 3.4, 0.0) ( -9.6, 3.4) ( -4.2, -1.0) ( 2.5, -2.7) ( 1.0, 8.7) ( 9.5, -0.7) (-33.0, 0.0) ( 6.0, 7.0) ( -7.0, 0.0) ( 6.0, -7.0) ( 3.0, 1.7) ( -2.5, 2.7) ( -1.0, -8.7) ( -9.5, 0.7) out-of-place inverse FFT of Y: 0.1 0.4 0.7 1.0 0.2 0.5 0.8 1.1 0.3 0.6 0.9 1.2 4.1 3.4 2.7 1.0 1.2 6.5 4.8 3.1 2.3 1.6 7.9 2.2 in-place inverse FFT of Y: 0.1 0.4 0.7 1.0 0.2 0.5 0.8 1.1 0.3 0.6 0.9 1.2 4.1 3.4 2.7 1.0 1.2 6.5 4.8 3.1 2.3 1.6 7.9 2.2

CODE EXAMPLE 5-4 Three-Dimensional Real-to-Complex FFT and Complex-to-Real FFT of a Three-Dimensional Array

my_system% cat testsc3.f

       PROGRAM TESTSC3

       IMPLICIT NONE

       INTEGER LW, NCPUS, IERR, I, J, K, USING_THREADS, IFAC(128*3)

       INTEGER, PARAMETER :: N1 = 3, N2 = 4, N3 = 2, LDX1 = N1,

      $                      LDX2 = N2, LDY1 = N1/2+1, LDY2 = N2,

      $                      LDR1 = 2*(N1/2+1), LDR2 = N2

       REAL, PARAMETER :: ONE = 1.0, SCALE = ONE/(N1*N2*N3)

       REAL :: V(LDR1,LDR2,N3), X(LDX1,LDX2,N3), Z(LDR1,LDR2,N3),

      $        TRIGS(2*(N1+N2+N3))

       REAL, DIMENSION(:), ALLOCATABLE :: SW

       COMPLEX :: Y(LDY1,LDY2,N3)

       WRITE(*,*) $'Three-dimensional complex-to-real and real-to-complex FFT'

       WRITE(*,*)

* get number of threads

       NCPUS = USING_THREADS()

* compute workspace size required

       LW = (MAX(MAX(N1,2*N2),2*N3) + 16*N3) * NCPUS

       ALLOCATE(SW(LW))

       X = RESHAPE(SOURCE =

      $    (/ .1, .2, .3, .4, .5, .6, .7, .8, .9,1.0,1.1,1.2,

      $     4.1,1.2,2.3,3.4,6.5,1.6,2.7,4.8,7.9,1.0,3.1,2.2/),

      $     SHAPE=(/LDX1,LDX2,N3/))

       V = RESHAPE(SOURCE =

      $    (/.1,.2,.3,0.,.4,.5,.6,0.,.7,.8,.9,0.,1.0,1.1,1.2,0.,

      $     4.1,1.2,2.3,0.,3.4,6.5,1.6,0.,2.7,4.8,7.9,0.,

      $     1.0,3.1,2.2,0./), SHAPE=(/LDR1,LDR2,N3/))

       WRITE(*,*) 'X ='

       DO K = 1, N3

          DO I = 1, N1

             WRITE(*,'(5(F5.1,2X))') (X(I,J,K), J = 1, N2)

          END DO

          WRITE(*,*)

       END DO

* Initialize trig table and get factors of N1, N2 and N3

       CALL SFFTC3(0,N1,N2,N3,ONE,X,LDX1,LDX2,Y,LDY1,LDY2,TRIGS,

      $            IFAC,SW,0,IERR)

* Compute 3-dimensional out-of-place forward FFT.

* Let FFT routine allocate memory.

* cannot do an in-place transform because LDX1 < 2*(N1/2+1)

       CALL SFFTC3(-1,N1,N2,N3,ONE,X,LDX1,LDX2,Y,LDY1,LDY2,TRIGS,

      $            IFAC,SW,0,IERR)

       WRITE(*,*) 'out-of-place forward FFT of X:'

       WRITE(*,*)'Y ='

       DO K = 1, N3

          DO I = 1, N1/2+1

             WRITE(*,'(5(A1, F5.1,A1,F5.1,A1,2X))')('(',REAL(Y(I,J,K)),

      $              ',',AIMAG(Y(I,J,K)),')', J = 1, N2)

          END DO

          WRITE(*,*)

       END DO

* Compute 3-dimensional in-place forward FFT.

* Use workspace already allocated.

* V which is real array containing input data is also

* used to store complex results; as a complex array, its first

* leading dimension is LDR1/2.

       CALL SFFTC3(-1,N1,N2,N3,ONE,V,LDR1,LDR2,V,LDR1/2,LDR2,TRIGS,

      $            IFAC,SW,LW,IERR)

       WRITE(*,*) 'in-place forward FFT of X:'

       CALL PRINT_REAL_AS_COMPLEX(N1/2+1, N2, N3, V, LDR1/2, LDR2)

* Compute 3-dimensional out-of-place inverse FFT.

* First leading dimension of Z (LDR1) must be even

       CALL CFFTS3(1,N1,N2,N3,SCALE,Y,LDY1,LDY2,Z,LDR1,LDR2,TRIGS,

      $            IFAC,SW,0,IERR)

       WRITE(*,*) 'out-of-place inverse FFT of Y:'

       DO K = 1, N3

         DO I = 1, N1

            WRITE(*,'(5(F5.1,2X))') (Z(I,J,K), J = 1, N2)

         END DO

         WRITE(*,*)

       END DO

* Compute 3-dimensional in-place inverse FFT.

* Y which is complex array containing input data is also

* used to store real results; as a real array, its first

* leading dimension is 2*LDY1.

       CALL CFFTS3(1,N1,N2,N3,SCALE,Y,LDY1,LDY2,Y,2*LDY1,LDY2,

      $            TRIGS,IFAC,SW,LW,IERR)

       WRITE(*,*) 'in-place inverse FFT of Y:'

       CALL PRINT_COMPLEX_AS_REAL(N1, N2, N3, Y, 2*LDY1, LDY2)

       DEALLOCATE(SW)

       END PROGRAM TESTSC3

       SUBROUTINE PRINT_COMPLEX_AS_REAL(N1, N2, N3, A, LD1, LD2)

       INTEGER N1, N2, N3, I, J, K

       REAL A(LD1, LD2, *)

       DO K = 1, N3

          DO I = 1, N1

             WRITE(*,'(5(F5.1,2X))') (A(I,J,K), J = 1, N2)

          END DO

          WRITE(*,*)

       END DO

END

       SUBROUTINE PRINT_REAL_AS_COMPLEX(N1, N2, N3, A, LD1, LD2)

       INTEGER N1, N2, N3, I, J, K COMPLEX A(LD1, LD2, *)

       DO K = 1, N3

          DO I = 1, N1

             WRITE(*,'(5(A1, F5.1,A1,F5.1,A1,2X))') ('(',REAL(A(I,J,K)),

      $              ',',AIMAG(A(I,J,K)),')', J = 1, N2)

          END DO

          WRITE(*,*)

       END DO

END

my_system% f95 -dalign testsc3.f -xlic_lib=sunperf

my_system% a.out

Three-dimensional complex-to-real and real-to-complex FFT

X =

0.1 0.4 0.7 1.0

0.2 0.5 0.8 1.1

0.3 0.6 0.9 1.2

4.1 3.4 2.7 1.0

1.2 6.5 4.8 3.1

2.3 1.6 7.9 2.2

out-of-place forward FFT of X:

Y =

( 48.6, 0.0) ( -9.6, -3.4) ( 3.4, 0.0) ( -9.6, 3.4)

( -4.2, -1.0) ( 2.5, -2.7) ( 1.0, 8.7) ( 9.5, -0.7)

(-33.0, 0.0) (  6.0, 7.0) ( -7.0,  0.0) (  6.0, -7.0)

(  3.0, 1.7) ( -2.5, 2.7) ( -1.0, -8.7) ( -9.5,  0.7)

in-place forward FFT of X:

( 48.6, 0.0) ( -9.6, -3.4) ( 3.4, 0.0) ( -9.6, 3.4)

( -4.2, -1.0) ( 2.5, -2.7) ( 1.0, 8.7) ( 9.5, -0.7)

(-33.0, 0.0) (  6.0, 7.0) ( -7.0, 0.0) (  6.0, -7.0)

(  3.0, 1.7) ( -2.5, 2.7) ( -1.0, -8.7) ( -9.5, 0.7)

out-of-place inverse FFT of Y:

0.1 0.4 0.7 1.0

0.2 0.5 0.8 1.1

0.3 0.6 0.9 1.2

4.1 3.4 2.7 1.0

1.2 6.5 4.8 3.1

2.3 1.6 7.9 2.2

in-place inverse FFT of Y:

0.1 0.4 0.7 1.0

0.2 0.5 0.8 1.1

0.3 0.6 0.9 1.2

4.1 3.4 2.7 1.0

1.2 6.5 4.8 3.1

2.3 1.6 7.9 2.2

Comments

When doing an in-place real-to-complex or complex-to-real transform, care must be taken to ensure the size of the input array is large enough to hold the results. For example, if the input is of type complex stored in a complex array with first leading dimension N, then to use the same array to store the real results, its first leading dimension as a real output array would be 2 × N. Conversely, if the input is of type real stored in a real array with first leading dimension 2 × N, then to use the same array to store the complex results, its first leading dimension as a complex output array would be N. Leading dimension requirements for in-place and out-of-place transforms can be found in TABLE 5-2, TABLE 5-3, and TABLE 5-4.

In the linear and multi-dimensional FFT, the transform between real and complex data through a real-to-complex or complex-to-real transform can be confusing because N1 real data points correspond to $(\frac{N1}{2}+1)(\frac{N1}{2}+1)$ complex data points. N1 real data points do map to N1 complex data points, but because there is conjugate symmetry in the complex data, only $(\frac{N1}{2}+1)(\frac{N1}{2}+1)$ data points need to be stored as input in the complex-to-real transform and as output in the real-to-complex transform. In the multi-dimensional FFT, symmetry exists along all the dimensions, not just in the first. However, the two-dimensional and three-dimensional FFT routines store the complex data of the second and third dimensions in their entirety.

While the FFT routines accept any size of N1, N2 and N3, FFTs can be computed most efficiently when values of N1, N2 and N3 can be decomposed into relatively small primes. A real-to-complex or a complex-to-real transform can be computed most efficiently when

$N1, N2, N3=2^{p}\times 3^{q}\times 4^{r}\times 5^{s}$ ,

and a complex-to-complex transform can be computed most efficiently when

$N1, N2, N3=2^{p}\times 3^{q}\times 4^{r}\times 5^{s}\times 7^{t}\times 11^{u}\times 13^{v}$ ,

where p, q, r, s, t, u, and v are integers and p, q, r, s, t, u, v greater than or equal 0.

The function xFFTOPT can be used to determine the optimal sequence length, as shown in CODE EXAMPLE 5-5.

my_system% `cat fft_ex01.f` PROGRAM TEST INTEGER N, N1, N2, N3, RFFTOPT C N = 1024 N1 = 1019 N2 = 71 N3 = 49 C PRINT *, 'N Original N Suggested' PRINT '(I5, I12)', (N, RFFTOPT(N)) PRINT '(I5, I12)', (N1, RFFTOPT(N1)) PRINT '(I5, I12)', (N2, RFFTOPT(N2)) PRINT '(I5, I12)', (N3, RFFTOPT(N3)) END my_system% `f95 -dalign fft_ex01.f -xlic_lib=sunperf` my_system% `a.out` N Original N Suggested 1024 1024 1019 1024 71 72 49 49

CODE EXAMPLE 5-5 RFFTOPT Example

my_system% cat fft_ex01.f

      PROGRAM TEST

      INTEGER         N, N1, N2, N3, RFFTOPT

      N = 1024

      N1 = 1019

      N2 = 71

      N3 = 49

      PRINT *, 'N Original  N Suggested'

      PRINT '(I5, I12)', (N, RFFTOPT(N))

      PRINT '(I5, I12)', (N1, RFFTOPT(N1))

      PRINT '(I5, I12)', (N2, RFFTOPT(N2))

      PRINT '(I5, I12)', (N3, RFFTOPT(N3))

END

my_system% f95 -dalign fft_ex01.f -xlic_lib=sunperf

my_system% a.out

 N Original  N Suggested

 1024        1024

 1019        1024

   71          72

   49          49

Cosine and Sine Transforms

Input to the DFT that possess special symmetries occur in various applications. A transform that exploits symmetry usually saves in storage and computational count, such as with the real-to-complex and complex-to-real FFT transforms. The Sun Performance Library cosine and sine transforms are special cases of FFT routines that take advantage of the symmetry properties found in even and odd functions.

Note - Sun Performance Library sine and cosine transform routines are based on the routines contained in FFTPACK (http://www.netlib.org/fftpack/). Routines with a V prefix are vectorized routines that are based on the routines contained in VFFTPACK (http://www.netlib.org/vfftpack/).

Fast Cosine and Sine Transform Routines

TABLE 5-5 lists the Sun Performance Library fast cosine and sine transforms. Names of double precision routines are in square brackets. Routines whose name begins with 'V' can compute the transform of one or more sequences simultaneously. Those whose name ends with 'I' are initialization routines.


Name	Arguments
Fast Cosine Transforms for Even Sequences
`COST` `[DCOST]`	`(LEN+1`, `X,` `WORK)`
`COSTI` `[DCOSTI]`	`(LEN+1,` `WORK)`
`VCOST` `[VDCOST]`	`(M,` `LEN+1,` `X,` `WORK,` `LD,` `TABLE)`
`VCOSTI` `[VDCOSTI]`	`(LEN+1,` `TABLE)`
Fast Cosine Transforms for Quarter-Wave Even Sequences
`COSQF` `[DCOSQF]`	`(LEN,` `X,` `WORK)`
`COSQB` `[DCOSQB]`	`(LEN,` `X,` `WORK)`
`COSQI` `[DCOSQI]`	`(LEN,` `WORK)`
`VCOSQF` `[VDCOSQF]`	`(M,` `LEN,` `X,` `WORK,` `LD,` `TABLE)`
`VCOSQB` `[VDCOSQB]`	`(M,` `LEN,` `X,` `WORK,` `LD,` `TABLE)`
`VCOSQI` `[VDCOSQI]`	`(LEN,` `TABLE)`
Fast Sine Transforms for Odd Sequences
`SINT` `[DSINT]`	`(LEN-1,` `X,` `WORK)`
`SINTI` `[DSINTI]`	`(LEN-1,` `WORK)`
`VSINT` `[VDSINT]`	`(M,` `LEN-1,` `X,` `WORK,` `LD,` `TABLE)`
`VSINTI` `[VDSINTI]`	`(LEN-1,` `TABLE)`
Fast Sine Transforms for Quarter-Wave Odd Sequences
`SINQF` `[DSINQF]`	`(LEN,` `X,` `WORK)`
`SINQB` `[DSINQB]`	`(LEN,` `X,` `WORK)`
`SINQI` `[DSINQI]`	`(LEN,` `WORK)`
`VSINQF` `[VDSINQF]`	`(M,` `LEN,` `X,` `WORK,` `LD,` `TABLE)`
`VSINQB` `[VDSINQB]`	`(M,` `LEN,` `X,` `WORK,` `LD,` `TABLE)`
`VSINQI` `[VDSINQI]`	`(LEN,` `TABLE)`

TABLE 5-5 Notes:

M: Number of sequences to be transformed.

LEN, LEN-1, LEN+1: Length of the input sequence or sequences.

X: A real array which contains the sequence or sequences to be transformed. On output, the real transform results are stored in X.

TABLE: Array of constants particular to a transform size that is required by the transform routine. The constants are computed by the initialization routine.

WORK: Workspace required by the transform routine. In routines that operate on a single sequence, WORK also contains constants computed by the initialization routine.

Fast Cosine Transforms

A special form of the FFT that operates on real even sequences is the fast cosine transform (FCT). A real sequence x is said to have even symmetry if x(n) = x(-n) where n = -N + 1, ..., 0, ..., N. An FCT of a sequence of length 2N requires N + 1 input data points and produces a sequence of size N + 1. Routine COST computes the FCT of a single real even sequence while VCOST computes the FCT of one or more sequences. Before calling [V]COST, [V]COSTI must be called to compute trigonometric constants and factors associated with input length N + 1. The FCT is its own inverse transform. Calling VCOST twice will result in the original N +1 data points. Calling COST twice will result in the original N +1 data points multiplied by 2N.

An even sequence x with symmetry such that x(n) = x(-n - 1) where n = -N + 1, ... , 0, ..., N is said to have quarter-wave even symmetry. COSQF and COSQB compute the FCT and its inverse, respectively, of a single real quarter-wave even sequence. VCOSQF and VCOSQB operate on one or more sequences. The results of [V]COSQB are unormalized, and if scaled by $\frac{1}{4N}$ , the original sequences are obtained. An FCT of a real sequence of length 2N that has quarter-wave even symmetry requires N input data points and produces an N-point resulting sequence. Initialization is required before calling the transform routines by calling [V]COSQI.

Fast Sine Transforms

Another type of symmetry that is commonly encountered is the odd symmetry where x(n) = -x(-n) for n = -N+1, ..., 0, ..., N. As in the case of the fast cosine transform, the fast sine transform (FST) takes advantage of the odd symmetry to save memory and computation. For a real odd sequence x, symmetry implies that x(0) = -x(0) = 0. Therefore, if x is of length 2N then only N = 1 values of x are required to compute the FST. Routine SINT computes the FST of a single real odd sequence while VSINT computes the FST of one or more sequences. Before calling [V]SINT, [V]SINTI must be called to compute trigonometric constants and factors associated with input length N - 1. The FST is its own inverse transform. Calling VSINT twice will result in the original N -1 data points. Calling SINT twice will result in the original N -1 data points multiplied by 2N.

An odd sequence with symmetry such that x(n) = -x(-n - 1), where
n = -N + 1, ..., 0, ..., N is said to have quarter-wave odd symmetry. SINQF and SINQB compute the FST and its inverse, respectively, of a single real quarter-wave odd sequence while VSINQF and VSINQB operate on one or more sequences. SINQB is unnormalized, so using the results of SINQF as input in SINQB produces the original sequence scaled by a factor of 4N. However, VSINQB is normalized, so a call to VSINQF followed by a call to VSINQB will produce the original sequence. An FST of a real sequence of length 2N that has quarter-wave odd symmetry requires N input data points and produces an N-point resulting sequence. Initialization is required before calling the transform routines by calling [V]SINQI.

Discrete Fast Cosine and Sine Transforms and Their Inverse

Sun Performance Library routines use the equations in the following sections to compute the fast cosine and sine transforms and inverse transforms.

`[D]COST`: Forward and Inverse Fast Cosine Transform (FCT) of a Sequence

The forward and inverse FCT of a sequence is computed as

$X(k)=x(0)+2\sum _{n=1}^{N-1}x(n)\cos (\frac{\pi nk}{N})+x(N)\cos (\pi k), k=0,\ldots N$ .

[D]COST Notes:

N + 1 values are needed to compute the FCT of an N-point sequence.

[D]COST also computes the inverse transform. When [D]COST is called twice, the result will be the original sequence scaled by

$\frac{1}{2N}$ .

`V[D]COST`: Forward and Inverse Fast Cosine Transforms of Multiple Sequences (VFCT)

The forward and inverse FCTs of multiple sequences are computed as

For i = 0, M - 1

$X(i,k)=\frac{x(i,0)}{2N}+\frac{1}{N}\sum _{n=1}^{N-1}x(i,n)\cos (\frac{\pi nk}{N})+\frac{x(i,N)}{2N}\cos (\pi k), k=0,\ldots N$ .

V[D]COST Notes

M × (N+1) values are needed to compute the VFCT of M N-point sequences.

The input and output sequences are stored row-wise.

V[D]COST is normalized and is its own inverse. When V[D]COST is called twice, the result will be the original data.

`[D]COSQF`: Forward FCT of a Quarter-Wave Even Sequence

The forward FCT of a quarter-wave even sequence is computed as

$X(k)=x(0)+2\sum _{n=1}^{N-1}x(n)\cos (\frac{\pi n(2k+1)}{2N}), k=0,\ldots N-1$ .

N values are needed to compute the forward FCT of an N-point quarter-wave even sequence.

`[D]COSQB`: Inverse FCT of a Quarter-Wave Even Sequence

The inverse FCT of a quarter-wave even sequence is computed as

$x(n)=\sum _{k=0}^{N-1}X(k)\cos (\frac{\pi n(2k+1)}{2N}), n=0,\ldots N-1$ .

Calling the forward and inverse routines will result in the original input scaled by $\frac{1}{4N}$ .

`V[D]COSQF`: Forward FCT of One or More Quarter-Wave Even Sequences

The forward FCT of one or more quarter-wave even sequences is computed as

For i = 0, M - 1

$X(i,k)=\frac{1}{N}\left[ x(i,0)+2\sum _{n=1}^{N-1}x(i,n)\cos (\frac{\pi n(2k+1)}{2N})\right] , k=0,\ldots N-1$ .

V[D]COSQF Notes:

The input and output sequences are stored row-wise.

The transform is normalized so that if the inverse routine V[D]COSQB is called immediately after calling V[D]COSQF, the original data is obtained.

`V[D]COSQB`: Inverse FCT of One or More Quarter-Wave Even Sequences

The inverse FCT of one or more quarter-wave even sequences is computed as

For i = 0, M - 1

$x(i,n)=\sum _{k=0}^{N-1}X(i,k)\cos (\frac{\pi n(2k+1)}{2N}) n=0,\ldots N-1$ .

V[D]COSQB Notes:

The input and output sequences are stored row-wise.

The transform is normalized so that if V[D]COSQB is called immediately after calling V[D]COSQF, the original data is obtained.

`[D]SINT`: Forward and Inverse Fast Sine Transform (FST) of a Sequence

The forward and inverse FST of a sequence is computed as

$X(k)=2\sum _{n=0}^{N-2}x(n)\sin (\frac{\pi (n+1)(k+1)}{N}), k=0,\ldots N-2$ .

[D]SINT Notes:

N-1 values are needed to compute the FST of an N-point sequence.

[D]SINT also computes the inverse transform. When [D]SINT is called twice, the result will be the original sequence scaled by

$\frac{1}{2N}$ .

`V[D]SINT`: Forward and Inverse Fast Sine Transforms of Multiple Sequences (VFST)

The forward and inverse fast sine transforms of multiple sequences are computed as

For i = 0, M - 1

$X(i,k)=\frac{2}{\sqrt{2N}}\sum _{n=0}^{N-2}x(i,n)\sin (\frac{\pi (n+1)(k+1)}{N}), k=0,\ldots N-2$ .

V[D]SINT Notes:

M × (N - 1) values are needed to compute the VFST of M N-point sequences.

The input and output sequences are stored row-wise.

V[D]SINT is normalized and is its own inverse. Calling V[D]SINT twice yields the original data.

`[D]SINQF`: Forward FST of a Quarter-Wave Odd Sequence

The forward FST of a quarter-wave odd sequence is computed as

$X(k)=2\sum _{n=0}^{N-2}x(n)\sin (\frac{\pi (n+1)(2k+1)}{2N})+x(N-1)\cos (\pi k), k=0,\ldots N-1$ .

N values are needed to compute the forward FST of an N-point quarter-wave odd sequence.

`[D]SINQB`: Inverse FST of a Quarter-Wave Odd Sequence

The inverse FST of a quarter-wave odd sequence is computed as

$x(n)=2\sum _{k=0}^{N-1}X(k)\sin (\frac{\pi (n+1)(2k+1)}{2N}), n=0,\ldots N-1$ .

Calling the forward and inverse routines will result in the original input scaled by $\frac{1}{4N}$ .

`V[D]SINQF`: Forward FST of One or More Quarter-Wave Odd Sequences

The forward FST of one or more quarter-wave odd sequences is computed as

For i = 0, M - 1

$X(i,k)=\frac{1}{\sqrt{4N}}\left[ 2\sum _{n=0}^{N-2}x(n,i)\sin (\frac{\pi (n+1)(2k+1)}{2N})+x(N-1,i)\cos (\pi k)\right] , k=0,\ldots N-1$ .

V[D]SINQF Notes:

The input and output sequences are stored row-wise.

The transform is normalized so that if the inverse routine V[D]SINQB is called immediately after calling V[D]SINQF, the original data is obtained.

`V[D]SINQB`: Inverse FST of One or More Quarter-Wave Odd Sequences

The inverse FST of one or more quarter-wave odd sequences is computed as

For i = 0, M - 1

$x(n,i)=\frac{4}{\sqrt{4N}}\sum _{k=0}^{N-1}X(k,i)\sin (\frac{\pi (n+1)(2k+1)}{2N}), n=0,\ldots N-1$ .

V[D]SINQB Notes:

The input and output sequences are stored row-wise.

The transform is normalized, so that if V[D]SINQB is called immediately after calling V[D]SINQF, the original data is obtained.

Fast Cosine Transform Examples

CODE EXAMPLE 5-6 calls COST to compute the FCT and the inverse transform of a real even sequence. If the real sequence is of length 2N, only N + 1 input data points need to be stored and the number of resulting data points is also N + 1. The results are stored in the input array.

my_system% `cat cost.f` program cost implicit none integer,parameter :: len=4 real x(0:len),work(3(len+1)+15), z(0:len), scale integer i scale = 1.0/(2.0len) call RANDOM_NUMBER(x(0:len)) z(0:len) = x(0:len) write(,'(a25,i1,a10,i1,a12)')'Input sequence of length ', $ len,' requires ', len+1,' data points' write(,'(5(f8.3,2x),/)')(x(i),i=0,len) call costi(len+1, work) call cost(len+1, z, work) write(,)'Forward fast cosine transform' write(,'(5(f8.3,2x),/)')(z(i),i=0,len) call cost(len+1, z, work) write(,) $ 'Inverse fast cosine transform (results scaled by 1/2N)' write(,'(5(f8.3,2x),/)')(z(i)scale,i=0,len) end my_system% `f95 -dalign cost.f -xlic_lib=sunperf` my_system% `a.out` Input sequence of length 4 requires 5 data points 0.557 0.603 0.210 0.352 0.867 Forward fast cosine transform 3.753 0.046 1.004 -0.666 -0.066 Inverse fast cosine transform (results scaled by 1/2*N) 0.557 0.603 0.210 0.352 0.867

CODE EXAMPLE 5-6 Compute FCT and Inverse FCT of Single Real Even Sequence

my_system% cat cost.f

       program cost

       implicit none

       integer,parameter :: len=4

       real x(0:len),work(3*(len+1)+15), z(0:len), scale

       integer i

       scale = 1.0/(2.0*len)

       call RANDOM_NUMBER(x(0:len))

       z(0:len) = x(0:len)

       write(*,'(a25,i1,a10,i1,a12)')'Input sequence of length ',

     $          len,' requires ', len+1,' data points'

       write(*,'(5(f8.3,2x),/)')(x(i),i=0,len)

       call costi(len+1, work)

       call cost(len+1, z, work)

       write(*,*)'Forward fast cosine transform'

       write(*,'(5(f8.3,2x),/)')(z(i),i=0,len)

       call cost(len+1, z, work)

       write(*,*)

     $    'Inverse fast cosine transform (results scaled by 1/2*N)'

       write(*,'(5(f8.3,2x),/)')(z(i)*scale,i=0,len)

end

my_system% f95 -dalign cost.f -xlic_lib=sunperf

my_system% a.out

Input sequence of length 4 requires 5 data points

0.557 0.603 0.210 0.352 0.867

Forward fast cosine transform

3.753 0.046 1.004 -0.666 -0.066

Inverse fast cosine transform (results scaled by 1/2*N)

0.557 0.603 0.210 0.352 0.867

CODE EXAMPLE 5-7 calls VCOSQF and VCOSQB to compute the FCT and the inverse FCT, respectively, of two real quarter-wave even sequences. If the real sequences are of length 2N, only N input data points need to be stored, and the number of resulting data points is also N. The results are stored in the input array.

my_system% `cat vcosq.f` program vcosq implicit none integer,parameter :: len=4, m = 2, ld = m+1 real x(ld,len),xt(ld,len),work(3len+15), z(ld,len) integer i, j call RANDOM_NUMBER(x) z = x write(,'(a27,i1)')' Input sequences of length ',len do j = 1,m write(,'(a3,i1,a4,4(f5.3,2x),a1,/)') $ 'seq',j,' = (',(x(j,i),i=1,len),')' end do call vcosqi(len, work) call vcosqf(m,len, z, xt, ld, work) write(,) $ 'Forward fast cosine transform for quarter-wave even sequences' do j = 1,m write(,'(a3,i1,a4,4(f5.3,2x),a1,/)') $ 'seq',j,' = (',(z(j,i),i=1,len),')' end do call vcosqb(m,len, z, xt, ld, work) write(,) $ 'Inverse fast cosine transform for quarter-wave even sequences' write(,)'(results are normalized)' do j = 1,m write(*,'(a3,i1,a4,4(f5.3,2x),a1,/)') $ 'seq',j,' = (',(z(j,i),i=1,len),')' end do end my_system% `f95 -dalign vcosq.f -xlic_lib=sunperf` my_system% `a.out` Input sequences of length 4 seq1 = (0.557 0.352 0.990 0.539 ) seq2 = (0.603 0.867 0.417 0.156 ) Forward fast cosine transform for quarter-wave even sequences seq1 = (0.755 -.392 -.029 0.224 ) seq2 = (0.729 0.097 -.091 -.132 ) Inverse fast cosine transform for quarter-wave even sequences (results are normalized) seq1 = (0.557 0.352 0.990 0.539 ) seq2 = (0.603 0.867 0.417 0.156 )

CODE EXAMPLE 5-7 Compute the FCT and the Inverse FCT of Two Real Quarter-wave Even Sequences

my_system% cat vcosq.f

    program vcosq

    implicit none

    integer,parameter :: len=4, m = 2, ld = m+1

    real x(ld,len),xt(ld,len),work(3*len+15), z(ld,len)

    integer i, j

    call RANDOM_NUMBER(x)

    z = x

    write(*,'(a27,i1)')' Input sequences of length ',len

    do j = 1,m

       write(*,'(a3,i1,a4,4(f5.3,2x),a1,/)')

 $     'seq',j,' = (',(x(j,i),i=1,len),')'

    end do

    call vcosqi(len, work)

    call vcosqf(m,len, z, xt, ld, work)

    write(*,*)

 $ 'Forward fast cosine transform for quarter-wave even sequences'

    do j = 1,m

       write(*,'(a3,i1,a4,4(f5.3,2x),a1,/)')

 $     'seq',j,' = (',(z(j,i),i=1,len),')'

    end do

    call vcosqb(m,len, z, xt, ld, work)

    write(*,*)

 $  'Inverse fast cosine transform for quarter-wave even sequences'

    write(*,*)'(results are normalized)'

    do j = 1,m

       write(*,'(a3,i1,a4,4(f5.3,2x),a1,/)')

 $     'seq',j,' = (',(z(j,i),i=1,len),')'

    end do

end

my_system% f95 -dalign vcosq.f -xlic_lib=sunperf

my_system% a.out

Input sequences of length 4

seq1 = (0.557 0.352 0.990 0.539 )

seq2 = (0.603 0.867 0.417 0.156 )

Forward fast cosine transform for quarter-wave even sequences

seq1 = (0.755 -.392 -.029 0.224 )

seq2 = (0.729 0.097 -.091 -.132 )

Inverse fast cosine transform for quarter-wave even sequences

(results are normalized)

seq1 = (0.557 0.352 0.990 0.539 )

seq2 = (0.603 0.867 0.417 0.156 )

Fast Sine Transform Examples

In CODE EXAMPLE 5-8, SINT is called to compute the FST and the inverse transform of a real odd sequence. If the real sequence is of length 2N, only N - 1 input data points need to be stored and the number of resulting data points is also N - 1. The results are stored in the input array.

my_system% `cat sint.f` program sint implicit none integer,parameter :: len=4 real x(0:len-2),work(3(len-1)+15), z(0:len-2), scale integer i call RANDOM_NUMBER(x(0:len-2)) z(0:len-2) = x(0:len-2) scale = 1.0/(2.0len) write(,'(a25,i1,a10,i1,a12)')'Input sequence of length ', $ len,' requires ', len-1,' data points' write(,'(3(f8.3,2x),/)')(x(i),i=0,len-2) call sinti(len-1, work) call sint(len-1, z, work) write(,)'Forward fast sine transform' write(,'(3(f8.3,2x),/)')(z(i),i=0,len-2) call sint(len-1, z, work) write(,) $ 'Inverse fast sine transform (results scaled by 1/2N)' write(,'(3(f8.3,2x),/)')(z(i)scale,i=0,len-2) end my_system% `f95 -dalign sint.f -xlic_lib=sunperf` my_system% `a.out` Input sequence of length 4 requires 3 data points 0.557 0.603 0.210 Forward fast sine transform 2.291 0.694 -0.122 Inverse fast sine transform (results scaled by 1/2*N) 0.557 0.603 0.210

CODE EXAMPLE 5-8 Compute FST and the Inverse FST of a Real Odd Sequence

my_system% cat sint.f

      program sint

      implicit none

      integer,parameter :: len=4

      real x(0:len-2),work(3*(len-1)+15), z(0:len-2), scale

      integer i

      call RANDOM_NUMBER(x(0:len-2))

      z(0:len-2) = x(0:len-2)

      scale = 1.0/(2.0*len)

      write(*,'(a25,i1,a10,i1,a12)')'Input sequence of length ',

    $       len,' requires ', len-1,' data points'

      write(*,'(3(f8.3,2x),/)')(x(i),i=0,len-2)

      call sinti(len-1, work)

      call sint(len-1, z, work)

      write(*,*)'Forward fast sine transform'

      write(*,'(3(f8.3,2x),/)')(z(i),i=0,len-2)

      call sint(len-1, z, work)

      write(*,*) $ 'Inverse fast sine transform (results scaled by 1/2*N)'

      write(*,'(3(f8.3,2x),/)')(z(i)*scale,i=0,len-2)

end

my_system% f95 -dalign sint.f -xlic_lib=sunperf

my_system% a.out

Input sequence of length 4 requires 3 data points

0.557 0.603 0.210

Forward fast sine transform

2.291 0.694 -0.122

Inverse fast sine transform (results scaled by 1/2*N)

0.557 0.603 0.210

In CODE EXAMPLE 5-9 VSINQF and VSINQB are called to compute the FST and inverse FST, respectively, of two real quarter-wave odd sequences. If the real sequence is of length 2N, only N input data points need to be stored and the number of resulting data points is also N. The results are stored in the input array.

my_system% `cat vsinq.f` program vsinq implicit none integer,parameter :: len=4, m = 2, ld = m+1 real x(ld,len),xt(ld,len),work(3len+15), z(ld,len) integer i, j call RANDOM_NUMBER(x) z = x write(,'(a27,i1)')' Input sequences of length ',len do j = 1,m write(,'(a3,i1,a4,4(f5.3,2x),a1,/)') $ 'seq',j,' = (',(x(j,i),i=1,len),')' end do call vsinqi(len, work) call vsinqf(m,len, z, xt, ld, work) write(,) $ 'Forward fast sine transform for quarter-wave odd sequences' do j = 1,m write(,'(a3,i1,a4,4(f5.3,2x),a1,/)') $ 'seq',j,' = (',(z(j,i),i=1,len),')' end do call vsinqb(m,len, z, xt, ld, work) write(,) $ 'Inverse fast sine transform for quarter-wave odd sequences' write(,)'(results are normalized)' do j = 1,m write(*,'(a3,i1,a4,4(f5.3,2x),a1,/)') $ 'seq',j,' = (',(z(j,i),i=1,len),')' end do end my_system% `f95 vsinq.f -xlic_lib=sunperf` my_system% `a.out` Input sequences of length 4 seq1 = (0.557 0.352 0.990 0.539 ) seq2 = (0.603 0.867 0.417 0.156 ) Forward fast sine transform for quarter-wave odd sequences seq1 = (0.823 0.057 0.078 0.305 ) seq2 = (0.654 0.466 -.069 -.037 ) Inverse fast sine transform for quarter-wave odd sequences (results are normalized) seq1 = (0.557 0.352 0.990 0.539 ) seq2 = (0.603 0.867 0.417 0.156 )

CODE EXAMPLE 5-9 Compute FST and Inverse FST of Two Real Quarter-Wave Odd Sequences

my_system% cat vsinq.f

      program vsinq

      implicit none

      integer,parameter :: len=4, m = 2, ld = m+1

      real x(ld,len),xt(ld,len),work(3*len+15), z(ld,len)

      integer i, j

      call RANDOM_NUMBER(x)

      z = x

      write(*,'(a27,i1)')' Input sequences of length ',len

      do j = 1,m

         write(*,'(a3,i1,a4,4(f5.3,2x),a1,/)')

     $         'seq',j,' = (',(x(j,i),i=1,len),')'

      end do

      call vsinqi(len, work)

      call vsinqf(m,len, z, xt, ld, work)

      write(*,*)

    $ 'Forward fast sine transform for quarter-wave odd sequences'

      do j = 1,m

      write(*,'(a3,i1,a4,4(f5.3,2x),a1,/)')

    $       'seq',j,' = (',(z(j,i),i=1,len),')'

      end do

      call vsinqb(m,len, z, xt, ld, work)

      write(*,*)

    $ 'Inverse fast sine transform for quarter-wave odd sequences'

      write(*,*)'(results are normalized)'

      do j = 1,m

      write(*,'(a3,i1,a4,4(f5.3,2x),a1,/)')

    $       'seq',j,' = (',(z(j,i),i=1,len),')'

      end do

end

my_system% f95 vsinq.f -xlic_lib=sunperf

my_system% a.out

Input sequences of length 4

seq1 = (0.557 0.352 0.990 0.539 )

seq2 = (0.603 0.867 0.417 0.156 )

Forward fast sine transform for quarter-wave odd sequences

seq1 = (0.823 0.057 0.078 0.305 )

seq2 = (0.654 0.466 -.069 -.037 )

Inverse fast sine transform for quarter-wave odd sequences

(results are normalized)

seq1 = (0.557 0.352 0.990 0.539 )

seq2 = (0.603 0.867 0.417 0.156 )

Convolution and Correlation

Two applications of the FFT that are frequently encountered especially in the signal processing area are the discrete convolution and discrete correlation operations.

Convolution

Given two functions x(t) and y(t), the Fourier transform of the convolution of x(t) and y(t), denoted as x $star star \odotstar\odot$ y, is the product of their individual Fourier transforms: DFT(x $star star \odotstar\odot$ y)=X $star star \odotstar\odot$ Y where $star star \odotstar\odot$ denotes the convolution operation and $star star \odotstar\odot$ denotes pointwise multiplication.

Typically, x(t) is a continuous and periodic signal that is represented discretely by a set of N data points x_j, j = 0, ..., N -1, sampled over a finite duration, usually for one period of x(t) at equal intervals. y(t) is usually a response that starts out as zero, peaks to a maximum value, and then returns to zero. Discretizing y(t) at equal intervals produces a set of N data points, y_k, k = 0, ..., N -1. If the actual number of samplings in y_k is less than N, the data can be padded with zeros. The discrete convolution can then be defined as

(x $star \equiv \sum ^{\frac{N}{2}}_{k=\frac{-N}{2}+1}x_{j-k}y_{k},\, \, \, j=0,\ldots N-1$ y)j $star \equiv \sum ^{\frac{N}{2}}_{k=\frac{-N}{2}+1}x_{j-k}y_{k},\, \, \, j=0,\ldots N-1$ .

The values of $y_{k},k=\frac{-N}{2}+1,\ldots \frac{N}{2}k=0,\ldots N-1$ , are the same as those of $y_{k},k=\frac{-N}{2}+1,\ldots \frac{N}{2}k=0,\ldots N-1$ but in the wrap-around order.

The Sun Performance Library routines allow the user to compute the convolution by using the definition above with k = 0, ..., N -1, or by using the FFT. If the FFT is used to compute the convolution of two sequences, the following steps are performed:

Compute X = forward FFT of x

Compute Y = forward FFT of y

Compute Z = X $\odot\Leftrightarrowstar$

Y $\odot\Leftrightarrowstar$ DFT(x $\odot\Leftrightarrowstar$ y)

Compute z = inverse FFT of Z; z = (x

star y)

One interesting characteristic of convolution is that the product of two polynomials is actually a convolution. A product of an m-term polynomial

$a(x)=a_{0}+a_{1}x+\ldots +a_{m-1}x^{m-1}$

and an n-term polynomial

$b(x)=b_{0}+b_{1}x+...+b_{n-1}x^{n-1}$

has m + n - 1 coefficients that can be obtained by

$c_{k}=\sum _{j=max(k-(m-1),0)}^{min(k,n-1)}a_{j}b_{k-j}$ ,

where k = 0, ..., m + n - 2.

Correlation

Closely related to convolution is the correlation operation. It computes the correlation of two sequences directly superposed or when one is shifted relative to the other. As with convolution, we can compute the correlation of two sequences efficiently as follows using the FFT:

Compute the FFT of the two input sequences.

Compute the pointwise product of the resulting transform of one sequence and the complex conjugate of the transform of the other sequence.

Compute the inverse FFT of the product.

The routines in the Performance Library also allow correlation to be computed by the following definition:

$Corr(x,y)_{j}\equiv \sum _{k=0}^{N-1}x_{j+k}y_{k},\, \, \, j=0,\ldots N-1$ .

There are various ways to interpret the sampled input data of the convolution and correlation operations. The argument list of the convolution and correlation routines contain parameters to handle cases in which

The signal and/or response function can start at different sampling time

The user might want only part of the signal to contribute to the output

The signal and/or response function can begin with one or more zeros that are not explicitly stored.

Sun Performance Library Convolution and Correlation Routines

Sun Performance Library contains the convolution routines shown in TABLE 5-6.


Routine	Arguments	Function
`SCNVCOR`, `DCNVCOR`, `CCNVCOR`, `ZCNVCOR`	`CNVCOR,FOUR,NX,X,IFX, INCX,NY,NPRE,M,Y,IFY, INC1Y,INC2Y,NZ,K,Z, IFZ,INC1Z,INC2Z,WORK, LWORK`	Convolution or correlation of a filter with one or more vectors
`SCNVCOR2`, `DCNVCOR2`, `CCNVCOR2`, `ZCNVCOR2`	`CNVCOR,METHOD,TRANSX, SCRATCHX,TRANSY, SCRATCHY,MX,NX,X,LDX, MY,NY,MPRE,NPRE,Y,LDY, MZ,NZ,Z,LDZ,WORKIN, LWORK`	Two-dimensional convolution or correlation of two matrices
`SWIENER`, `DWIENER`	`N_POINTS,ACOR,XCOR, FLTR,EROP,ISW,IERR`	Wiener deconvolution of two signals

The [S,D,C,Z]CNVCOR routines are used to compute the convolution or correlation of a filter with one or more input vectors. The [S,D,C,Z]CNVCOR2 routines are used to compute the two-dimensional convolution or correlation of two matrices.

Arguments for Convolution and Correlation Routines

The one-dimensional convolution and correlation routines use the arguments shown in TABLE 5-7.


Argument	Definition
`CNVCOR`	`V' or `v' specifies that convolution is computed. `R' or `r' specifies that correlation is computed.
`FOUR`	`T' or `t' specifies that the Fourier transform method is used. `D' or `d' specifies that the direct method is used, where the convolution or correlation is computed from the definition of convolution and correlation. ^[1]
`NX`	Length of filter vector, where `NX` 0.
`X`	Filter vector
`IFX`	Index of first element of `X`, where `NX` `IFX` 1
`INCX`	Stride between elements of the vector in `X`, where `INCX` > 0.
`NY`	Length of input vectors, where `NY` 0.
`NPRE`	Number of implicit zeros prefixed to the `Y` vectors, where `NPRE` 0.
`M`	Number of input vectors, where `M` 0.
`Y`	Input vectors.
`IFY`	Index of the first element of `Y`, where `NY` `IFY` 1
`INC1Y`	Stride between elements of the input vectors in `Y`, where `INC1Y` > 0.
`INC2Y`	Stride between input vectors in `Y`, where `INC2Y` > 0.
`NZ`	Length of the output vectors, where `NZ` 0.
`K`	Number of `Z` vectors, where `K` 0. If `K` < `M`, only the first `K` vectors will be processed. If `K` > `M`, all input vectors will be processed and the last `M`-`K` output vectors will be set to zero on exit.
`Z`	Result vectors
`IFZ`	Index of the first element of `Z`, where `NZ` `IFZ` 1
`INC1Z`	Stride between elements of the output vectors in `Z`, where `INCYZ` > 0.
`INC2Z`	Stride between output vectors in `Z`, where `INC2Z` > 0.
`WORK`	Work array
`LWORK`	Length of work array

The two-dimensional convolution and correlation routines use the arguments shown in TABLE 5-8.


Argument	Definition
`CNVCOR`	`V' or `v' specifies that convolution is computed. `R' or `r' specifies that correlation is computed.
`METHOD`	`T' or `t' specifies that the Fourier transform method is used. `D' or `d' specifies that the direct method is used, where the convolution or correlation is computed from the definition of convolution and correlation. ^[2]
`TRANSX`	`N' or `n' specifies that `X` is the filter matrix `T' or `t' specifies that the transpose of `X` is the filter matrix
`SCRATCHX`	`N' or `n' specifies that `X` must be preserved `S' or `s' specifies that `X` can be used for scratch space. The contents of `X` are undefined after returning from a call where `X` is used for scratch space.
`TRANSY`	`N' or `n' specifies that `Y` is the input matrix `T' or `t' specifies that the transpose of `Y` is the input matrix
`SCRATCHY`	`N' or `n' specifies that `Y` must be preserved `S' or `s' specifies that `Y` can be used for scratch space. The contents of `X` are undefined after returning from a call where `Y` is used for scratch space.
`MX`	Number of rows in the filter matrix `X`, where `MX` 0
`NX`	Number of columns in the filter matrix `X`, where `NX` 0
`X`	Filter matrix. `X` is unchanged on exit when `SCRATCHX` is `N' or `n' and undefined on exit when `SCRATCHX` is `S' or `s'.
`LDX`	Leading dimension of array containing the filter matrix `X`.
`MY`	Number of rows in the input matrix `Y`, where `MY` 0.
`NY`	Number of columns in the input matrix `Y`, where `NY` 0
`MPRE`	Number of implicit zeros prefixed to each row of the input matrix `Y` vectors, where `MPRE` 0.
`NPRE`	Number of implicit zeros prefixed to each column of the input matrix `Y`, where `NPRE` 0.
`Y`	Input matrix. `Y` is unchanged on exit when `SCRATCHY` is `N' or `n' and undefined on exit when `SCRATCHY` is `S' or `s'.
`LDY`	Leading dimension of array containing the input matrix `Y`.
`MZ`	Number of output vectors, where `MZ` 0.
`NZ`	Length of output vectors, where `NZ` 0.
`Z`	Result vectors
`LDZ`	Leading dimension of the array containing the result matrix `Z`, where `LDZ` `MAX(1,MZ)`.
`WORKIN`	Work array
`LWORK`	Length of work array

Work Array `WORK` for Convolution and Correlation Routines

The minimum dimensions for the WORK work arrays used with the one-dimensional and two-dimensional convolution and correlation routines are shown in TABLE 5-11. The minimum dimensions for one-dimensional convolution and correlation routines depend upon the values of the arguments NPRE, NX, NY, and NZ.

The minimum dimensions for two-dimensional convolution and correlation routines depend upon the values of the arguments shown TABLE 5-9.


Argument	Definition
`MX`	Number of rows in the filter matrix
`MY`	Number of rows in the input matrix
`MZ`	Number of output vectors
`NX`	Number of columns in the filter matrix
`NY`	Number of columns in the input matrix
`NZ`	Length of output vectors
`MPRE`	Number of implicit zeros prefixed to each row of the input matrix
`NPRE`	Number of implicit zeros prefixed to each column of the input matrix
`MPOST`	`MAX(0,MZ-MYC)`
`NPOST`	`MAX(0,NZ-NYC)`
`MYC`	`MPRE` + `MPOST` + `MYC_INIT`, where `MYC_INIT` depends upon filter and input matrices, as shown in TABLE 5-10
`NYC`	`NPRE` + `NPOST` + `NYC_INIT`, where `NYC_INIT` depends upon filter and input matrices, as shown in TABLE 5-10

MYC_INIT and NYC_INIT depend upon the following, where X is the filter matrix and Y is the input matrix.


	Y		Transpose(Y)
	X	Transpose(X)	X	Transpose(X)
`MYC_INIT`	`MAX(MX,MY)`	`MAX(NX,MY)`	`MAX(MX,NY)`	`MAX(NX,NY)`
`NYC_INIT`	`MAX(NX,NY)`	`MAX(MX,NY)`	`MAX(NX,MY)`	`MAX(MX,MY)`

The values assigned to the minimum work array size is shown in TABLE 5-11.


Routine	Minimum Work Array Size (`WORK`)	Type
`SCNVCOR, DCNVCOR`	`4*(MAX(NX,NPRE+NY)` `+ MAX(0,NZ-NY))`	`REAL, REAL*8`
`CCNVCOR, ZCNVCOR`	`2*(MAX(NX,NPRE+NY)` `+ MAX(0,NZ-NY)))`	`COMPLEX, COMPLEX*16`
`SCNVCOR2`^{^[3]}`, DCNVCOR2`¹	`MY` `+` `NY` `+` `30`	`COMPLEX, COMPLEX*16`
`SCNVCOR2`^{^[3]}`, DCNVCOR2`¹	`MY` `+` `NY` `+` `30`	`COMPLEX, COMPLEX*16`
`CCNVCOR2`¹`, ZCNVCOR2`¹	If `MY` = `NY`:`MYC + 8`If `MY` `NY: MYC + NYC + 16`	`COMPLEX, COMPLEX*16`

Sample Program: Convolution

CODE EXAMPLE 5-10 uses CCNVCOR to perform FFT convolution of two complex vectors.

my_system% `cat con_ex20.f` PROGRAM TEST C INTEGER LWORK INTEGER N PARAMETER (N = 3) PARAMETER (LWORK = 4 * N + 15) COMPLEX P1(N), P2(N), P3(2N-1), WORK(LWORK) DATA P1 / 1, 2, 3 /, P2 / 4, 5, 6 / C EXTERNAL CCNVCOR C PRINT , 'P1:' PRINT 1000, P1 PRINT , 'P2:' PRINT 1000, P2 CALL CCNVCOR ('V', 'T', N, P1, 1, 1, N, 0, 1, P2, 1, 1, 1, $ 2 N - 1, 1, P3, 1, 1, 1, WORK, LWORK) C PRINT *, 'P3:' PRINT 1000, P3 C 1000 FORMAT (1X, 100(F4.1,' +',F4.1,'i ')) C END my_system% `f95 -dalign con_ex20.f -xlic_lib=sunperf` my_system% `a.out` P1: 1.0 + 0.0i 2.0 + 0.0i 3.0 + 0.0i P2: 4.0 + 0.0i 5.0 + 0.0i 6.0 + 0.0i P3: 4.0 + 0.0i 13.0 + 0.0i 28.0 + 0.0i 27.0 + 0.0i 18.0 + 0.0i

CODE EXAMPLE 5-10 One-Dimensional Convolution Using Fourier Transform Method and COMPLEX Data

my_system% cat con_ex20.f

      PROGRAM TEST

      INTEGER           LWORK

      INTEGER           N

      PARAMETER        (N = 3)

      PARAMETER        (LWORK = 4 * N + 15)

      COMPLEX           P1(N), P2(N), P3(2*N-1), WORK(LWORK)

      DATA P1 / 1, 2, 3 /,  P2 / 4, 5, 6 /

      EXTERNAL          CCNVCOR

      PRINT *, 'P1:'

      PRINT 1000, P1

      PRINT *, 'P2:'

      PRINT 1000, P2

      CALL CCNVCOR ('V', 'T', N, P1, 1, 1, N, 0, 1, P2, 1, 1, 1,

     $              2 * N - 1, 1, P3, 1, 1, 1, WORK, LWORK)

      PRINT *, 'P3:'

      PRINT 1000, P3

 1000 FORMAT (1X, 100(F4.1,' +',F4.1,'i  '))

END

my_system% f95 -dalign con_ex20.f -xlic_lib=sunperf

my_system% a.out

P1:

  1.0 + 0.0i   2.0 + 0.0i   3.0 + 0.0i

P2:

  4.0 + 0.0i   5.0 + 0.0i   6.0 + 0.0i

P3:

  4.0 + 0.0i  13.0 + 0.0i  28.0 + 0.0i  27.0 + 0.0i  18.0 + 0.0i

If any vector overlaps a writable vector, either because of argument aliasing or ill-chosen values of the various INC arguments, the results are undefined and can vary from one run to the next.

The most common form of the computation, and the case that executes fastest, is applying a filter vector X to a series of vectors stored in the columns of Y with the result placed into the columns of Z. In that case, INCX = 1, INC1Y = 1, INC2Y greater than or equal NY, INC1Z = 1, INC2Z NZ. Another common form is applying a filter vector X to a series of vectors stored in the rows of Y and store the result in the row of Z, in which case INCX = 1, INC1Y NY, INC2Y = 1, INC1Z NZ, and INC2Z = 1.

Convolution can be used to compute the products of polynomials. CODE EXAMPLE 5-11 uses SCNVCOR to compute the product of 1 + 2x + 3x² and 4 + 5x + 6x².

my_system% `cat con_ex21.f` PROGRAM TEST INTEGER LWORK, NX, NY, NZ PARAMETER (NX = 3) PARAMETER (NY = NX) PARAMETER (NZ = 2NY-1) PARAMETER (LWORK = 4NZ+32) REAL X(NX), Y(NY), Z(NZ), WORK(LWORK) C DATA X / 1, 2, 3 /, Y / 4, 5, 6 /, WORK / LWORK*0 / C PRINT 1000, 'X' PRINT 1010, X PRINT 1000, 'Y' PRINT 1010, Y CALL SCNVCOR ('V', 'T', NX, X, 1, 1, $NY, 0, 1, Y, 1, 1, 1, NZ, 1, Z, 1, 1, 1, WORK, LWORK) PRINT 1020, 'Z' PRINT 1010, Z 1000 FORMAT (1X, 'Input vector ', A1) 1010 FORMAT (1X, 300F5.0) 1020 FORMAT (1X, 'Output vector ', A1) END my_system% `f95 -dalign con_ex21.f -xlic_lib=sunperf` my_system% `a.out` Input vector X 1. 2. 3. Input vector Y 4. 5. 6. Output vector Z 4. 13. 28. 27. 18.

CODE EXAMPLE 5-11 One-Dimensional Convolution Using Fourier Transform Method and REAL Data

my_system% cat con_ex21.f

      PROGRAM TEST

      INTEGER     LWORK, NX, NY, NZ

      PARAMETER  (NX = 3)

      PARAMETER  (NY = NX)

      PARAMETER  (NZ = 2*NY-1)

      PARAMETER  (LWORK = 4*NZ+32)

      REAL        X(NX), Y(NY), Z(NZ), WORK(LWORK)

      DATA X / 1, 2, 3 /,  Y / 4, 5, 6 /, WORK / LWORK*0 /

      PRINT 1000, 'X'

      PRINT 1010, X

      PRINT 1000, 'Y'

      PRINT 1010, Y

      CALL SCNVCOR ('V', 'T', NX, X, 1, 1,

     $NY, 0, 1, Y, 1, 1, 1,   NZ, 1, Z, 1, 1, 1, WORK, LWORK)

      PRINT 1020, 'Z'

      PRINT 1010, Z

 1000 FORMAT (1X, 'Input vector ', A1)

 1010 FORMAT (1X, 300F5.0)

 1020 FORMAT (1X, 'Output vector ', A1)

END

my_system% f95 -dalign con_ex21.f -xlic_lib=sunperf

my_system% a.out

 Input vector X

    1.   2.   3.

 Input vector Y

    4.   5.   6.

 Output vector Z

    4.  13.  28.  27.  18.

Making the output vector longer than the input vectors, as in the example above, implicitly adds zeros to the end of the input. No zeros are actually required in any of the vectors, and none are used in the example, but the padding provided by the implied zeros has the effect of an end-off shift rather than an end-around shift of the input vectors.

CODE EXAMPLE 5-12 will compute the product between the vector [ 1, 2, 3 ] and the circulant matrix defined by the initial column vector [ 4, 5, 6 ].

my_system% `cat con_ex22.f` PROGRAM TEST C INTEGER LWORK, NX, NY, NZ PARAMETER (NX = 3) PARAMETER (NY = NX) PARAMETER (NZ = NY) PARAMETER (LWORK = 4NZ+32) REAL X(NX), Y(NY), Z(NZ), WORK(LWORK) C DATA X / 1, 2, 3 /, Y / 4, 5, 6 /, WORK / LWORK0 / C PRINT 1000, 'X' PRINT 1010, X PRINT 1000, 'Y' PRINT 1010, Y CALL SCNVCOR ('V', 'T', NX, X, 1, 1, $NY, 0, 1, Y, 1, 1, 1, NZ, 1, Z, 1, 1, 1, $WORK, LWORK) PRINT 1020, 'Z' PRINT 1010, Z C 1000 FORMAT (1X, 'Input vector ', A1) 1010 FORMAT (1X, 300F5.0) 1020 FORMAT (1X, 'Output vector ', A1) END my_system% `f95 -dalign con_ex22.f -xlic_lib=sunperf` my_system% `a.out` Input vector X 1. 2. 3. Input vector Y 4. 5. 6. Output vector Z 31. 31. 28.

CODE EXAMPLE 5-12 Convolution Used to Compute the Product of a Vector and Circulant Matrix

my_system% cat con_ex22.f

      PROGRAM TEST

      INTEGER     LWORK, NX, NY, NZ

      PARAMETER  (NX = 3)

      PARAMETER  (NY = NX)

      PARAMETER  (NZ = NY)

      PARAMETER  (LWORK = 4*NZ+32)

      REAL        X(NX), Y(NY), Z(NZ), WORK(LWORK)

      DATA X / 1, 2, 3 /,  Y / 4, 5, 6 /, WORK / LWORK*0 /

      PRINT 1000, 'X'

      PRINT 1010, X

      PRINT 1000, 'Y'

      PRINT 1010, Y

      CALL SCNVCOR ('V', 'T', NX, X, 1, 1,

     $NY, 0, 1, Y, 1, 1, 1,   NZ, 1, Z, 1, 1, 1,

     $WORK, LWORK)

      PRINT 1020, 'Z'

      PRINT 1010, Z

 1000 FORMAT (1X, 'Input vector ', A1)

 1010 FORMAT (1X, 300F5.0)

 1020 FORMAT (1X, 'Output vector ', A1)

END

my_system% f95 -dalign con_ex22.f -xlic_lib=sunperf

my_system% a.out

 Input vector X

    1.   2.   3.

 Input vector Y

    4.   5.   6.

 Output vector Z

   31.  31.  28.

The difference between this example and the previous example is that the length of the output vector is the same as the length of the input vectors, so there are no implied zeros on the end of the input vectors. With no implied zeros to shift into, the effect of an end-off shift from the previous example does not occur and the end-around shift results in a circulant matrix product.

my_system% `cat con_ex23.f` PROGRAM TEST C INTEGER M, N PARAMETER (M = 2) PARAMETER (N = 3) C INTEGER I, J COMPLEX P1(M,N), P2(M,N), P3(M,N) DATA P1 / 1, -2, 3, -4, 5, -6 /, P2 / -1, 2, -3, 4, -5, 6 / EXTERNAL CCNVCOR2 C PRINT , 'P1:' PRINT 1000, ((P1(I,J), J = 1, N), I = 1, M) PRINT , 'P2:' PRINT 1000, ((P2(I,J), J = 1, N), I = 1, M) C CALL CCNVCOR2 ('V', 'Direct', 'No Transpose X', 'No Overwrite X', $ 'No Transpose Y', 'No Overwrite Y', M, N, P1, M, $ M, N, 0, 0, P2, M, M, N, P3, M, 0, 0) C PRINT *, 'P3:' PRINT 1000, ((P3(I,J), J = 1, N), I = 1, M) C 1000 FORMAT (3(F5.1,' +',F5.1,'i ')) C END my_system% `f95 -dalign con_ex23.f -xlic_lib=sunperf` my_system% `a.out` P1: 1.0 + 0.0i 3.0 + 0.0i 5.0 + 0.0i -2.0 + 0.0i -4.0 + 0.0i -6.0 + 0.0i P2: -1.0 + 0.0i -3.0 + 0.0i -5.0 + 0.0i 2.0 + 0.0i 4.0 + 0.0i 6.0 + 0.0i P3: -83.0 + 0.0i -83.0 + 0.0i -59.0 + 0.0i 80.0 + 0.0i 80.0 + 0.0i 56.0 + 0.0i

CODE EXAMPLE 5-13 Two-Dimensional Convolution Using Direct Method

my_system% cat con_ex23.f

      PROGRAM TEST

      INTEGER           M, N

      PARAMETER        (M = 2)

      PARAMETER        (N = 3)

      INTEGER           I, J

      COMPLEX           P1(M,N), P2(M,N), P3(M,N)

      DATA P1 / 1, -2, 3, -4, 5, -6 /,  P2 / -1, 2, -3, 4, -5, 6 /

      EXTERNAL          CCNVCOR2

      PRINT *, 'P1:'

      PRINT 1000, ((P1(I,J), J = 1, N), I = 1, M)

      PRINT *, 'P2:'

      PRINT 1000, ((P2(I,J), J = 1, N), I = 1, M)

      CALL CCNVCOR2 ('V', 'Direct', 'No Transpose X', 'No Overwrite X',

     $   'No Transpose Y', 'No Overwrite Y', M, N, P1, M,

     $   M, N, 0, 0, P2, M, M, N, P3, M, 0, 0)

      PRINT *, 'P3:'

      PRINT 1000, ((P3(I,J), J = 1, N), I = 1, M)

 1000 FORMAT (3(F5.1,' +',F5.1,'i  '))

END

my_system% f95 -dalign con_ex23.f -xlic_lib=sunperf

my_system% a.out

P1:

  1.0 +  0.0i    3.0 +  0.0i    5.0 +  0.0i

 -2.0 +  0.0i   -4.0 +  0.0i   -6.0 +  0.0i

P2:

 -1.0 +  0.0i   -3.0 +  0.0i   -5.0 +  0.0i

  2.0 +  0.0i    4.0 +  0.0i    6.0 +  0.0i

P3:

-83.0 +  0.0i  -83.0 +  0.0i  -59.0 +  0.0i

 80.0 +  0.0i   80.0 +  0.0i   56.0 +  0.0i

References

For additional information on the DFT or FFT, see the following sources.

Briggs, William L., and Henson, Van Emden. The DFT: An Owner's Manual for the Discrete Fourier Transform. Philadelphia, PA: SIAM, 1995.

Brigham, E. Oran. The Fast Fourier Transform and Its Applications. Upper Saddle River, NJ: Prentice Hall, 1988.

Chu, Eleanor, and George, Alan. Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms. Boca Raton, FL: CRC Press, 2000.

Press, William H., Teukolsky, Saul A., Vetterling, William T., and Flannery, Brian P. Numerical Recipes in C: The Art of Scientific Computing. 2 ed. Cambridge, United Kingdom: Cambridge University Press, 1992.

Ramirez, Robert W. The FFT: Fundamentals and Concepts. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1985.

Swartzrauber, Paul N. Vectorizing the FFTs. In Rodrigue, Garry ed. Parallel Computations. New York: Academic Press, Inc., 1982.

Strang, Gilbert. Linear Algebra and Its Applications. 3 ed. Orlando, FL: Harcourt Brace & Company, 1988.

Van Loan, Charles. Computational Frameworks for the Fast Fourier Transform. Philadelphia, PA: SIAM, 1992.

Walker, James S. Fast Fourier Transforms. Boca Raton, FL: CRC Press, 1991.

^{1 (TableFootnote) When the lengths of the two sequences to be convolved are similar, the FFT method is faster than the direct method. However, when one sequence is much larger than the other, such as when convolving a large time-series signal with a small filter, the direct method performs faster than the FFT-based method.}

^{2 (TableFootnote) When the sizes of the two matrices to be convolved are similar, the FFT method is faster than the direct method. However, when one sequence is much larger than the other, such as when convolving a large data set with a small filter, the direct method performs faster than the FFT-based method.}

^{3 (TableFootnote) Memory will be allocated within the routine if the workspace size, indicated by LWORK, is not large enough.}

Forward and Inverse FFT Routines

Linear FFT Routines

Two-Dimensional FFT Routines

Three-Dimensional FFT Routines

Comments

Cosine and Sine Transforms

Fast Cosine and Sine Transform Routines

Fast Cosine Transforms

Fast Sine Transforms

Discrete Fast Cosine and Sine Transforms and Their Inverse

[D]COST: Forward and Inverse Fast Cosine Transform (FCT) of a Sequence

V[D]COST: Forward and Inverse Fast Cosine Transforms of Multiple Sequences (VFCT)

[D]COSQF: Forward FCT of a Quarter-Wave Even Sequence

[D]COSQB: Inverse FCT of a Quarter-Wave Even Sequence

V[D]COSQF: Forward FCT of One or More Quarter-Wave Even Sequences

V[D]COSQB: Inverse FCT of One or More Quarter-Wave Even Sequences

[D]SINT: Forward and Inverse Fast Sine Transform (FST) of a Sequence

V[D]SINT: Forward and Inverse Fast Sine Transforms of Multiple Sequences (VFST)

[D]SINQF: Forward FST of a Quarter-Wave Odd Sequence

[D]SINQB: Inverse FST of a Quarter-Wave Odd Sequence

V[D]SINQF: Forward FST of One or More Quarter-Wave Odd Sequences

V[D]SINQB: Inverse FST of One or More Quarter-Wave Odd Sequences

Fast Cosine Transform Examples

Fast Sine Transform Examples

Convolution and Correlation

Convolution

Correlation

Sun Performance Library Convolution and Correlation Routines

Arguments for Convolution and Correlation Routines

Work Array WORK for Convolution and Correlation Routines

Sample Program: Convolution

References

`[D]COST`: Forward and Inverse Fast Cosine Transform (FCT) of a Sequence

`V[D]COST`: Forward and Inverse Fast Cosine Transforms of Multiple Sequences (VFCT)

`[D]COSQF`: Forward FCT of a Quarter-Wave Even Sequence

`[D]COSQB`: Inverse FCT of a Quarter-Wave Even Sequence

`V[D]COSQF`: Forward FCT of One or More Quarter-Wave Even Sequences

`V[D]COSQB`: Inverse FCT of One or More Quarter-Wave Even Sequences

`[D]SINT`: Forward and Inverse Fast Sine Transform (FST) of a Sequence

`V[D]SINT`: Forward and Inverse Fast Sine Transforms of Multiple Sequences (VFST)

`[D]SINQF`: Forward FST of a Quarter-Wave Odd Sequence

`[D]SINQB`: Inverse FST of a Quarter-Wave Odd Sequence

`V[D]SINQF`: Forward FST of One or More Quarter-Wave Odd Sequences

`V[D]SINQB`: Inverse FST of One or More Quarter-Wave Odd Sequences

Work Array `WORK` for Convolution and Correlation Routines