C H A P T E R  6
 Using Sun Performance Library Signal Processing Routines

The discrete Fourier transform (DFT) has always been an important analytical tool in many areas in science and engineering. However, it was not until the development of the fast Fourier transform (FFT) that the DFT became widely used. This is because the DFT requires O(N2) computations, while the FFT only requires O(Nlog2N) operations.

Sun Performance Library contains a set of routines that computes the FFT, related FFT operations, such as convolution and correlation, and trigonometric transforms.

This chapter is divided into the following three sections.

• Forward and Inverse FFT Routines
• Sine and Cosine Transforms
• Convolution and Correlation

Each section includes examples that show how the routines might be used.

For information on the Fortran 95 and C interfaces and types of arguments used in each routine, see the section 3P man pages for the individual routines. For example, to display the man page for the SFFTC routine, type man -s 3P sfftc. Routine names must be lowercase. For an overview of the FFT routines, type man -s 3P fft.

## Forward and Inverse FFT Routines

TABLE 6-1 lists the names of the FFT routines and their calling sequence. Double precision routine names are in square brackets. See the individual man pages for detailed information on the data type and size of the arguments.

TABLE 6-1 FFT Routines and Their Arguments

Routine Name

Arguments

Linear Routines

CFFTS [ZFFTD]

(OPT, N1, SCALE, X, Y, TRIGS, IFAC, WORK, LWORK, ERR)

SFFTC [DFFTZ]

(OPT, N1, SCALE, X, Y, TRIGS, IFAC, WORK, LWORK, ERR)

CFFTSM [ZFFTDM]

(OPT, N1, N2, SCALE, X, LDX1, Y, LDY1, TRIGS, IFAC, WORK, LWORK, ERR)

SFFTCM [DFFTZM]

(OPT, N1, N2, SCALE, X, LDX1, Y, LDY1, TRIGS, IFAC, WORK, LWORK, ERR)

CFFTC [ZFFTZ]

(OPT, N1, SCALE, X, Y, TRIGS, IFAC, WORK, LWORK, ERR)

CFFTCM [ZFFTZM]

(OPT, N1, N2, SCALE, X, LDX1, Y, LDY1, TRIGS, IFAC, WORK, LWORK, ERR)

Two-Dimensional Routines

CFFTS2 [ZFFTD2]

(OPT, N1, N2, SCALE, X, LDX1, Y, LDY1, TRIGS, IFAC, WORK, LWORK, ERR)

SFFTC2 [DFFTZ2]

(OPT, N1, N2, SCALE, X, LDX1, Y, LDY1, TRIGS, IFAC, WORK, LWORK, ERR)

CFFTC2 [ZFFTZ2]

(OPT, N1, N2, SCALE, X, LDX1, Y, LDY1, TRIGS, IFAC, WORK, LWORK, ERR)

Three-Dimensional Routines

CFFTS3 [ZFFTD3]

(OPT, N1, N2, N3, SCALE, X, LDX1, LDX2, Y, LDY1, LDY2, TRIGS, IFAC, WORK, LWORK, ERR)

SFFTC3 [DFFTZ3]

(OPT, N1, N2, N3, SCALE, X, LDX1, LDX2, Y, LDY1, LDY2, TRIGS, IFAC, WORK, LWORK, ERR)

CFFTC3 [ZFFTZ3]

(OPT, N1, N2, N3, SCALE, X, LDX1, LDX2, Y, LDY1, LDY2, TRIGS, IFAC, WORK, LWORK, ERR)

Sun Performance Library FFT routines use the following arguments.

• OPT: Flag indicating whether the routine is called to initialize or to compute the transform.
• N1, N2, N3: Problem dimensions for one, two, and three dimensional transforms.
• X: Input array where X is of type COMPLEX if the routine is a complex-to-complex transform or a complex-to-real tranform. X is of type REAL for a real-to-complex transform.
• Y: Output array where Y is of type COMPLEX if the routine is a complex-to-complex transform or a real-to-complex tranform. Y is of type REAL for a complex-to-real transform.
• LDX1, LDX2 and LDY1, LDY2: LDX1 and LDX2 are the leading dimensions of the input array, and LDY1 and LDY2 are the leading dimensions of the output array. The FFT routines allow the output to overwrite the input, which is an in-place transform, or to be stored in a separate array apart from the input array, which is an out-of-place transform. In complex-to-complex tranforms, the input data is of the same size as the output data. However, real-to-complex and complex-to-real transforms have different memory requirements for input and output data. Care must be taken to ensure that the input array is large enough to acommodate the transform results when computing an in-place tranform.
• TRIGS: Array containing the trigonometric weights.
• IFAC: Array containing factors of the problem dimensions. The problem sizes are as follows:
• Linear FFT: Problem size of dimension N1
• Two-dimensional FFT: Problem size of dimensions N1 and N2
• Three-dimensional FFT: Problem size of dimensions N1, N2, and N3

While N1, N2, and N3 can be of any size, a real-to-complex or a complex-to-real transform can be computed most efficiently when

and a complex-to-complex transform can be computed most efficiently when

where p, q, r, s, t, u, and v are integers and p, q, r, s, t, u, v 0.

• WORK: Workspace whose size depends on the routine and the number of threads that are being used to compute the transform if the routine is parallelized.
• LWORK: Size of workspace. If LWORK is zero, the routine will allocate a workspace with the required size.

• SCALE: A scalar with which the output is scaled. Occasionally in literature, the inverse transform is defined with a scaling factor of
• for one-dimensional transforms, for two-dimensional transforms, and for three-dimensional transforms. In such case, the inverse transform is said to be normalized. If a normalized FFT is followed by its inverse FFT, the result is the original input data. The Sun Performance Library FFT routines are not normalized. However, normalization can be done easily by calling the inverse FFT routine with the appropriate scaling factor stored in SCALE.

• ERR: A flag returning a nonzero value if an error is encountered in the routine and zero otherwise.

### Linear FFT Routines

Linear FFT routines compute the FFT of real or complex data in one dimension only. The data can be one or more complex or real sequences. For a single sequence, the data is stored in a vector. If more than one sequence is being transformed, the sequences are stored column-wise in a two-dimensional array and a one-dimensional FFT is computed for each sequence along the column direction. The linear forward FFT routines compute

,

where , or expressed in polar form,

.

The inverse FFT routines compute

,

or in polar form,

.

With the forward transform, if the input is one or more complex sequences of size N1, the result will be one or more complex sequences, each consisting of N1 unrelated data points. However, if the input is one or more real sequences, each containing N1 real data points, the result will be one or more complex sequences that are conjugate symmetric. That is,

.

The imaginary part of X(0) is always zero. If N1 is even, the imaginary part of is also zero. Both zeros are stored explicitly. Because the second half of each sequence can be derived from the first half, only complex data points are computed and stored in the output array. Here and elsewhere in this chapter, integer division is rounded down.

With the inverse transform, if an N1-point complex-to-complex transform is being computed, then N1 unrelated data points are expected in each input sequence and N1 data points will be returned in the output array. However, if an N1-point complex-to-real transform is being computed, only the first complex data points of each conjugate symmetric input sequence are expected in the input, and the routine will return N1 real data points in each output sequence.

For each value of N1, either the forward or the inverse routine must be called to compute the factors of N1 and the trigonometric weights associated with those factors before computing the actual FFT. The factors and trigonometric weights can be reused in subsequent transforms as long as N1 remains unchanged.

TABLE 6-2 lists the single precision linear FFT routines and their purposes. For routines that have two-dimensional arrays as input and output, TABLE 6-2 also lists the leading dimension requirements. The same information applies to the corresponding double precision routines except that their data types are double precision and double complex. See TABLE 6-2 for the mapping. See the individual man pages for a complete description of the routines and their arguments.

TABLE 6-2 Single Precision Linear FFT Routines

Name

Purpose

Size and Type

of Input

Size and Type

of Output

In-place

Out-of-Place

SFFTC

OPT = 0 initialization

OPT = -1 real-to-complex forward linear FFT of a single vector

N1,

Real

,

Complex

SFFTC

OPT = 0 initialization

OPT = 1 complex-to-real inverse linear FFT of single vector

,

Complex

N1

Real

CFFTC

OPT = 0 initialization

OPT = -1 complex-to-complex forward linear FFT of a single vector

N1,

Complex

N1,

Complex

OPT = 1 complex-to-complex inverse linear FFT of a single vector

N1,

Complex

N1,

Complex

SFFTCM

OPT = 0 initialization

OPT = -1 real-to-complex forward linear FFT of M vectors

N1 × M,

Real

,

Complex

LDX1 = 2 × LDY1

LDX1 N1

CFFTSM

OPT = 0 initialization

OPT = 1 complex-to-real inverse linear FFT of M vectors

,

Complex

N1 × M,

Real

LDX1

LDY1=2 × LDX1

LDX1

LDY1 N1

CFFTCM

OPT = 0 initialization

OPT = -1 complex-to-complex forward linear FFT of M vectors

N1 × M,

Complex

N1 × M,

Complex

LDX1 N1

LDY1 N1

LDX1 N1

LDY1 N1

OPT = 1 complex-to-complex inverse linear FFT of M vectors

N1 × M,

Complex

N1 × M,

Complex

LDX1 N1

LDY1 N1

LDX1 N1

LDY1 N1

TABLE 6-2 Notes.

• LDX1 is the leading dimension of the input array.
• LDY1 is the leading dimension of the output array.
• N1 is the first dimension of the FFT problem.
• N2 is the second dimension of the FFT problem.
• When calling routines with OPT = 0 to initialize the routine, the only error checking that is done is to determine if N1 < 0

CODE EXAMPLE 6-1 shows how to compute the linear real-to-complex and complex-to-real FFT of a set of sequences.

   my_system% cat testscm.f   PROGRAM TESTSCM   IMPLICIT NONE   INTEGER :: LW, IERR, I, J, K, LDX, LDC   INTEGER,PARAMETER :: N1 = 3, N2 = 2, LDZ = N1,   $LDC = N1, LDX = 2*LDC   INTEGER, DIMENSION(:) :: IFAC(128)   REAL :: SCALE   REAL, PARAMETER :: ONE = 1.0   REAL, DIMENSION(:) :: SW(N1), TRIGS(2*N1)   REAL, DIMENSION(0:LDX-1,0:N2-1) :: X, V, Y   COMPLEX, DIMENSION(0:LDZ-1, 0:N2-1) :: Z  * workspace size LW = N1 SCALE = ONE/N1   WRITE(*,*)$ 'Linear complex-to-real and real-to-complex FFT of a sequence'   WRITE(*,*)   X = RESHAPE(SOURCE = (/.1, .2, .3,0.0,0.0,0.0,7.,8.,9.,   $0.0, 0.0, 0.0/), SHAPE=(/6,2/)) V = X   WRITE(*,*) 'X = '   DO I = 0,N1-1   WRITE(*,'(2(F4.1,2x))') (X(I,J), J = 0, N2-1)   END DO   WRITE(*,*)  * intialize trig table and compute factors of N1   CALL SFFTCM(0, N1, N2, ONE, X, LDX, Z, LDZ, TRIGS, IFAC,  $ SW, LW, IERR)   IF (IERR .NE. 0) THEN   PRINT*,'ROUTINE RETURN WITH ERROR CODE = ', IERR   STOP   END IF     * Compute out-of-place forward linear FFT.  * Let FFT routine allocate memory.   CALL SFFTCM(-1, N1, N2, ONE, X, LDX, Z, LDZ, TRIGS, IFAC,   $SW, 0, IERR)   IF (IERR .NE. 0) THEN   PRINT*,'ROUTINE RETURN WITH ERROR CODE = ', IERR   STOP   END IF   WRITE(*,*) 'out-of-place forward FFT of X:'   WRITE(*,*)'Z ='   DO I = 0, N1/2   WRITE(*,'(2(A1, F4.1,A1,F4.1,A1,2x))') ('(',REAL(Z(I,J)),  $ ',',AIMAG(Z(I,J)),')', J = 0, N2-1)   END DO   WRITE(*,*)  * Compute in-place forward linear FFT.  * X must be large enough to store N1/2+1 complex values   CALL SFFTCM(-1, N1, N2, ONE, X, LDX, X, LDC, TRIGS, IFAC,   $SW, LW, IERR)   IF (IERR .NE. 0) THEN   PRINT*,'ROUTINE RETURN WITH ERROR CODE = ', IERR   STOP   END IF   WRITE(*,*) 'in-place forward FFT of X:'   CALL PRINT_REAL_AS_COMPLEX(N1/2+1, N2, 1, X, LDC, N2)   WRITE(*,*)  * Compute out-of-place inverse linear FFT.   CALL CFFTSM(1, N1, N2, SCALE, Z, LDZ, X, LDX, TRIGS, IFAC,  $ SW, LW, IERR)   IF (IERR .NE. 0) THEN   PRINT*,'ROUTINE RETURN WITH ERROR CODE = ', IERR   STOP   END IF   WRITE(*,*) 'out-of-place inverse FFT of Z:'   DO I = 0, N1-1   WRITE(*,'(2(F4.1,2X))') (X(I,J), J = 0, N2-1)   END DO   WRITE(*,*)  * Compute in-place inverse linear FFT.   CALL CFFTSM(1, N1, N2, SCALE, Z, LDZ, Z, LDZ*2, TRIGS,   $IFAC, SW, 0, IERR)   IF (IERR .NE. 0) THEN   PRINT*,'ROUTINE RETURN WITH ERROR CODE = ', IERR   STOP   END IF     WRITE(*,*) 'in-place inverse FFT of Z:'   CALL PRINT_COMPLEX_AS_REAL(N1, N2, 1, Z, LDZ*2, N2)   WRITE(*,*)   END PROGRAM TESTSCM   SUBROUTINE PRINT_COMPLEX_AS_REAL(N1, N2, N3, A, LD1, LD2)   INTEGER N1, N2, N3, I, J, K   REAL A(LD1, LD2, *)   DO K = 1, N3   DO I = 1, N1   WRITE(*,'(5(F4.1,2X))') (A(I,J,K), J = 1, N2)   END DO   WRITE(*,*)   END DO   END   SUBROUTINE PRINT_REAL_AS_COMPLEX(N1, N2, N3, A, LD1, LD2)   INTEGER N1, N2, N3, I, J, K   COMPLEX A(LD1, LD2, *)   DO K = 1, N3   DO I = 1, N1   WRITE(*,'(5(A1, F4.1,A1,F4.1,A1,2X))') ('(',REAL(A(I,J,K)),  $ ',',AIMAG(A(I,J,K)),')', J = 1, N2)   END DO   WRITE(*,*)   END DO   END  my_system% f95 -dalign testscm.f -xlic_lib=sunperf  my_system% a.out  Linear complex-to-real and real-to-complex FFT of a sequence  X =  0.1 7.0  0.2 8.0  0.3 9.0  out-of-place forward FFT of X:  Z =  ( 0.6, 0.0) (24.0, 0.0)  (-0.2, 0.1) (-1.5, 0.9)  in-place forward FFT of X:  ( 0.6, 0.0) (24.0, 0.0)  (-0.2, 0.1) (-1.5, 0.9)  out-of-place inverse FFT of Z:  0.1 7.0  0.2 8.0  0.3 9.0     in-place inverse FFT of Z:  0.1 7.0  0.2 8.0  0.3 9.0    

CODE EXAMPLE 6-1 Notes:

The forward FFT of X is actually

 (0.6, 0.0) (24.0, 0.0) Z = (-0.2, 0.1) (-1.5, 0.9) (-0.2, -0.1) (-1.5, -0.9)

Because of symmetry, Z(2) is the complex conjugate of Z(1), and therefore only the first two complex values are stored. For the in-place forward transform, SFFTCMis called with real array X as the input and output. Because SFFTCMexpects the output array to be of type COMPLEX, the leading dimension of X as an output array must be as if X were complex. Since the leading dimension of real array X is LDX= 2 ×LDC, the leading dimension of X as a complex output array must be LDC. Similarly, in the in-place inverse transform, CFFTSMis called with complex array Z as the input and output. Because CFFTSMexpects the output array to be of type REAL, the leading dimension of Z as an output array must be as if Z were real. Since the leading dimension of complex array Z is LDZ, the leading dimension of Z as a real output array must be LDZ×2.

CODE EXAMPLE 6-2 shows how to compute the linear complex-to-complex FFT of a set of sequences.

### Three-Dimensional FFT Routines

Sun Performance Library includes routines that compute three-dimensional FFT. In this case, the FFT is computed along all three dimensions of a three-dimensional array. The forward FFT computes

,

k = 0, ..., N1 - 1

n = 0, ..., N2 - 1

m = 0, ..., N3 - 1

and the inverse FFT computes

,

j = 0, ..., N1 - 1

l = 0, ..., N2 - 1

h = 0, ..., N3 - 1

In the complex-to-complex transform, if the input problem is N1 ×N2 ×N3, a three-dimensional transform will yield a complex array that is also N1 ×N2 ×N3. When computing a real-to-complex three-dimensional transform, if the real input array is of dimensions N1 ×N2 ×N3, the result will be a complex array of dimensions . Conversely, when computing a complex-to-real FFT of dimensions N1 ×N2 ×N3, an complex array is required as input. As with the real-to-complex and complex-to-real linear FFT, because of conjugate symmetry, only the first complex data points need to be stored along the first dimension. The complex subarray can be obtained from as follows:

To compute a three-dimensional transform, an FFT routine must be called twice: Once to initialize and once more to actually compute the transform. The initialization includes computing the factors of N1, N2, and N3 and the trigonometric weights associated with those factors. In subsequent forward or inverse transforms, initialization is not necessary as long as N1, N2, and N3 remain unchanged.

IMPORTANT:Upon returning from a three-dimensional FFT routine, Y(0 : N - 1, :, :) contains the transform results and the original contents of Y(N:LDY1-1, :, :) is overwritten. Here, N = N1 in the complex-to-complex and complex-to-real transforms and N = in the real-to-complex transform.

TABLE 6-4 lists the single precision three-dimensional FFT routines and their purposes. The same information applies to the corresponding double precision routines except that their data types are double precision and double complex. See TABLE 6-4 for the mapping. See the individual man pages for a complete description of the routines and their arguments.

TABLE 6-4 Single Precision Three-Dimensional FFT Routines

Name

Purpose

Size, Type of Input

Size, Type of Output

In-place

Out-of-Place

SFFTC3

OPT = 0 initialization

OPT = -1 real-to-complex forward three-dimensional FFT

N1 × N2 × N3, Real

,

Complex

LDX1=2 × LDY1

LDX2 N2

LDY1

LDY2 = LDX2

LDX1 N1

LDX2 N2

LDY1

LDY2 N2

CFFTS3

OPT = 0 initialization

OPT = 1 complex-to-real inverse three-dimensional FFT

,

Complex

N1 × N2 × N3, Real

LDX1

LDX2 N2

LDY1=2 × LDX1

LDY2=LDX2

LDX1

LDX2 N2

LDY1 2 × LDX1

LDY1 is even

LDY2 N2

CFFTC3

OPT = 0 initialization

OPT = -1 complex-to-complex forward three-dimensional FFT

N1 × N2 × N3,

Complex

N1 × N2 × N3,

Complex

LDX1 N1

LDX2 N2

LDY1=LDX1

LDY2=LDX2

LDX1 N1

LDX2 N2

LDY1 N1

LDY2 N2

OPT = 1 complex-to-complex inverse three-dimensional FFT

N1 × N2 × N3,

Complex

N1 × N2 × N3,

Complex

LDX1 N1

LDX2 N2

LDY1=LDX1

LDY2=LDX2

LDX1 N1

LDX2 N2

LDY1 N1

LDY2 N2

TABLE 6-4 Notes:

• LDX1 is first leading dimension of input array.
• LDX2 is the second leading dimension of the input array.
• LDY1 is the first leading dimension of the output array.
• LDY2 is the second leading dimension of the output array.
• N1 is the first dimension of the FFT problem.
• N2 is the second dimension of the FFT problem.
• N3 is the third dimension of the FFT problem.
• When calling routines with OPT = 0 to initialize the routine, the only error checking that is done is to determine if N1, N2, N3 < 0.

CODE EXAMPLE 6-4 shows how to compute the three-dimensional real-to-complex FFT and complex-to-real FFT of a three-dimensional array.

### Fast Sine Transform Examples

In CODE EXAMPLE 6-8, SINT is called to compute the FST and the inverse transform of a real odd sequence. If the real sequence is of length 2N, only N - 1 input data points need to be stored and the number of resulting data points is also N - 1. The results are stored in the input array.

   my_system% cat sint.f   program sint   implicit none   integer,parameter :: len=4   real x(0:len-2),work(3*(len-1)+15), z(0:len-2), scale   integer i   call RANDOM_NUMBER(x(0:len-2))   z(0:len-2) = x(0:len-2)   scale = 1.0/(2.0*len)   write(*,'(a25,i1,a10,i1,a12)')'Input sequence of length ',   $len,' requires ', len-1,' data points'   write(*,'(3(f8.3,2x),/)')(x(i),i=0,len-2)   call sinti(len-1, work)   call sint(len-1, z, work)   write(*,*)'Forward fast sine transform'   write(*,'(3(f8.3,2x),/)')(z(i),i=0,len-2)     call sint(len-1, z, work)   write(*,*)$ 'Inverse fast sine transform (results scaled by 1/2*N)'   write(*,'(3(f8.3,2x),/)')(z(i)*scale,i=0,len-2)   end  my_system% f95 -dalign sint.f -xlic_lib=sunperf  my_system% a.out  Input sequence of length 4 requires 3 data points  0.557 0.603 0.210  Forward fast sine transform  2.291 0.694 -0.122  Inverse fast sine transform (results scaled by 1/2*N)  0.557 0.603 0.210    

In CODE EXAMPLE 6-9 VSINQF and VSINQB are called to compute the FST and inverse FST, respectively, of two real quarter-wave odd sequences. If the real sequence is of length 2N, only N input data points need to be stored and the number of resulting data points is also N. The results are stored in the input array.

   my_system% cat vsinq.f   program vsinq   implicit none   integer,parameter :: len=4, m = 2, ld = m+1   real x(ld,len),xt(ld,len),work(3*len+15), z(ld,len)   integer i, j   call RANDOM_NUMBER(x)   z = x   write(*,'(a27,i1)')' Input sequences of length ',len   do j = 1,m   write(*,'(a3,i1,a4,4(f5.3,2x),a1,/)')   $'seq',j,' = (',(x(j,i),i=1,len),')'   end do   call vsinqi(len, work)   call vsinqf(m,len, z, xt, ld, work)   write(*,*)  $ 'Forward fast sine transform for quarter-wave odd sequences'   do j = 1,m   write(*,'(a3,i1,a4,4(f5.3,2x),a1,/)')   $'seq',j,' = (',(z(j,i),i=1,len),')'   end do     call vsinqb(m,len, z, xt, ld, work)   write(*,*)  $ 'Inverse fast sine transform for quarter-wave odd sequences'   write(*,*)'(results are normalized)'   do j = 1,m   write(*,'(a3,i1,a4,4(f5.3,2x),a1,/)')   $'seq',j,' = (',(z(j,i),i=1,len),')'   end do   end  my_system% f95 vsinq.f -xlic_lib=sunperf  my_system% a.out  Input sequences of length 4  seq1 = (0.557 0.352 0.990 0.539 )  seq2 = (0.603 0.867 0.417 0.156 )  Forward fast sine transform for quarter-wave odd sequences  seq1 = (0.823 0.057 0.078 0.305 )  seq2 = (0.654 0.466 -.069 -.037 )  Inverse fast sine transform for quarter-wave odd sequences  (results are normalized)  seq1 = (0.557 0.352 0.990 0.539 )  seq2 = (0.603 0.867 0.417 0.156 )    ## Convolution and Correlation Two applications of the FFT that are frequently encountered especially in the signal processing area are the discrete convolution and discrete correlation operations. ### Convolution Given two functions x(t) and y(t), the Fourier transform of the convolution of x(t) and y(t), denoted as x y, is the product of their individual Fourier transforms: DFT(x y)=XYwhere denotes the convolution operation and denotes pointwise multiplication. Typically, x(t) is a continuous and periodic signal that is represented discretely by a set of N data points xj, j = 0, ..., N -1, sampled over a finite duration, usually for one period of x(t) at equal intervals. y(t) is usually a response that starts out as zero, peaks to a maximum value, and then returns to zero. Discretizing y(t) at equal intervals produces a set of N data points, yk, k = 0, ..., N -1. If the actual number of samplings in yk is less than N, the data can be padded with zeros. The discrete convolution can then be defined as (xy)j. The values of , are the same as those of but in the wrap-around order. The Sun Performance Library routines allow the user to compute the convolution by using the definition above with k = 0, ..., N -1, or by using the FFT. If the FFT is used to compute the convolution of two sequences, the following steps are performed: • Compute X = forward FFT of x • Compute Y = forward FFT of y • Compute Z= X • YDFT(xy) • Compute z= inverse FFT of Z; z = (x • y) One interesting characteristic of convolution is that the product of two polynomials is actually a convolution. A product of an m-term polynomial and an n-term polynomial has m + n - 1 coefficients that can be obtained by , where k = 0, ..., m + n - 2. ### Correlation Closely related to convolution is the correlation operation. It computes the correlation of two sequences directly superposed or when one is shifted relative to the other. As with convolution, we can compute the correlation of two sequences efficiently as follows using the FFT: • Compute the FFT of the two input sequences. • Compute the pointwise product of the resulting transform of one sequence and the complex conjugate of the transform of the other sequence. • Compute the inverse FFT of the product. The routines in the Performance Library also allow correlation to be computed by the following definition: . There are various ways to interpret the sampled input data of the convolution and correlation operations. The argument list of the convolution and correlation routines contain parameters to handle cases in which • The signal and/or response function can start at different sampling time • The user might want only part of the signal to contribute to the output • The signal and/or response function can begin with one or more zeros that are not explicitly stored. ### Sun Performance Library Convolution and Correlation Routines Sun Performance Library contains the convolution routines shown in TABLE 6-6. TABLE 6-6 Convolution and Correlation Routines Routine Arguments Function SCNVCOR, DCNVCOR, CCNVCOR, ZCNVCOR CNVCOR,FOUR,NX,X,IFX, INCX,NY,NPRE,M,Y,IFY, INC1Y,INC2Y,NZ,K,Z, IFZ,INC1Z,INC2Z,WORK, LWORK Convolution or correlation of a filter with one or more vectors SCNVCOR2, DCNVCOR2, CCNVCOR2, ZCNVCOR2 CNVCOR,METHOD,TRANSX, SCRATCHX,TRANSY, SCRATCHY,MX,NX,X,LDX, MY,NY,MPRE,NPRE,Y,LDY, MZ,NZ,Z,LDZ,WORKIN, LWORK Two-dimensional convolution or correlation of two matrices SWIENER, DWIENER N_POINTS,ACOR,XCOR, FLTR,EROP,ISW,IERR Wiener deconvolution of two signals The [S,D,C,Z]CNVCOR routines are used to compute the convolution or correlation of a filter with one or more input vectors. The [S,D,C,Z]CNVCOR2 routines are used to compute the two-dimensional convolution or correlation of two matrices. ### Arguments for Convolution and Correlation Routines The one-dimensional convolution and correlation routines use the arguments shown in TABLE 6-7. TABLE 6-7 Arguments for One-Dimensional Convolution and Correlation Routines SCNVCOR , DCNVCOR , CCNVCOR , and ZCNVCOR Argument Definition CNVCOR V' or v' specifies that convolution is computed. R' or r' specifies that correlation is computed. FOUR T' or t' specifies that the Fourier transform method is used. D' or d' specifies that the direct method is used, where the convolution or correlation is computed from the definition of convolution and correlation. [1] NX Length of filter vector, where NX 0. X Filter vector IFX Index of first element of X, where NX IFX 1 INCX Stride between elements of the vector in X, where INCX > 0. NY Length of input vectors, where NY 0. NPRE Number of implicit zeros prefixed to the Y vectors, where NPRE 0. M Number of input vectors, where M 0. Y Input vectors. IFY Index of the first element of Y, where NY IFY 1 INC1Y Stride between elements of the input vectors in Y, where INC1Y > 0. INC2Y Stride between input vectors in Y, where INC2Y > 0. NZ Length of the output vectors, where NZ 0. K Number of Z vectors, where K 0. If K < M, only the first K vectors will be processed. If K > M, all input vectors will be processed and the last M-K output vectors will be set to zero on exit. Z Result vectors IFZ Index of the first element of Z, where NZ IFZ 1 INC1Z Stride between elements of the output vectors in Z, where INCYZ > 0. INC2Z Stride between output vectors in Z, where INC2Z > 0. WORK Work array LWORK Length of work array The two-dimensional convolution and correlation routines use the arguments shown in TABLE 6-8. TABLE 6-8 Arguments for Two-Dimensional Convolution and Correlation Routines SCNVCOR2 , DCNVCOR2 , CCNVCOR2 , and ZCNVCOR2 Argument Definition CNVCOR V' or v' specifies that convolution is computed. R' or r' specifies that correlation is computed. METHOD T' or t' specifies that the Fourier transform method is used. D' or d' specifies that the direct method is used, where the convolution or correlation is computed from the definition of convolution and correlation. [2] TRANSX N' or n' specifies that X is the filter matrix T' or t' specifies that the transpose of X is the filter matrix SCRATCHX N' or n' specifies that X must be preserved S' or s' specifies that X can be used for scratch space. The contents of X are undefined after returning from a call where X is used for scratch space. TRANSY N' or n' specifies that Y is the input matrix T' or t' specifies that the transpose of Y is the input matrix SCRATCHY N' or n' specifies that Y must be preserved S' or s' specifies that Y can be used for scratch space. The contents of X are undefined after returning from a call where Y is used for scratch space. MX Number of rows in the filter matrix X, where MX 0 NX Number of columns in the filter matrix X, where NX 0 X Filter matrix. X is unchanged on exit when SCRATCHX is N' or n' and undefined on exit when SCRATCHX is S' or s'. LDX Leading dimension of array containing the filter matrix X. MY Number of rows in the input matrix Y, where MY 0. NY Number of columns in the input matrix Y, where NY 0 MPRE Number of implicit zeros prefixed to each row of the input matrix Y vectors, where MPRE 0. NPRE Number of implicit zeros prefixed to each column of the input matrix Y, where NPRE 0. Y Input matrix. Y is unchanged on exit when SCRATCHY is N' or n' and undefined on exit when SCRATCHY is S' or s'. LDY Leading dimension of array containing the input matrix Y. MZ Number of output vectors, where MZ 0. NZ Length of output vectors, where NZ 0. Z Result vectors LDZ Leading dimension of the array containing the result matrix Z, where LDZ MAX(1,MZ). WORKIN Work array LWORK Length of work array ### Work Array WORK for Convolution and Correlation Routines The minimum dimensions for the WORK work arrays used with the one-dimensional and two-dimensional convolution and correlation routines are shown in TABLE 6-11. The minimum dimensions for one-dimensional convolution and correlation routines depend upon the values of the arguments NPRE, NX, NY, and NZ. The minimum dimensions for two-dimensional convolution and correlation routines depend upon the values of the arguments shown TABLE 6-9. TABLE 6-9 Arguments Affecting Minimum Work Array Size for Two-Dimensional Routines: SCNVCOR2 , DCNVCOR2 , CCNVCOR2 , and ZCNVCOR2 Argument Definition MX Number of rows in the filter matrix MY Number of rows in the input matrix MZ Number of output vectors NX Number of columns in the filter matrix NY Number of columns in the input matrix NZ Length of output vectors MPRE Number of implicit zeros prefixed to each row of the input matrix NPRE Number of implicit zeros prefixed to each column of the input matrix MPOST MAX(0,MZ-MYC) NPOST MAX(0,NZ-NYC) MYC MPRE + MPOST + MYC_INIT, where MYC_INIT depends upon filter and input matrices, as shown in TABLE 6-10 NYC NPRE + NPOST + NYC_INIT, where NYC_INIT depends upon filter and input matrices, as shown in TABLE 6-10 MYC_INIT and NYC_INIT depend upon the following, where X is the filter matrix and Y is the input matrix. TABLE 6-10 MYC_INIT and NYC_INIT Dependencies Y Transpose(Y) X Transpose(X) X Transpose(X) MYC_INIT MAX(MX,MY) MAX(NX,MY) MAX(MX,NY) MAX(NX,NY) NYC_INIT MAX(NX,NY) MAX(MX,NY) MAX(NX,MY) MAX(MX,MY) The values assigned to the minimum work array size is shown in TABLE 6-11. TABLE 6-11 Minimum Dimensions and Data Types for WORK Work Array Used With Convolution and Correlation Routines Routine Minimum Work Array Size (WORK) Type SCNVCOR, DCNVCOR 4*(MAX(NX,NPRE+NY) + MAX(0,NZ-NY)) REAL, REAL*8 CCNVCOR, ZCNVCOR 2*(MAX(NX,NPRE+NY) + MAX(0,NZ-NY))) COMPLEX, COMPLEX*16 SCNVCOR2[3], DCNVCOR21 MY + NY + 30 COMPLEX, COMPLEX*16 CCNVCOR21, ZCNVCOR21 If MY = NY: MYC + 8 If MY NY: MYC + NYC + 16 COMPLEX, COMPLEX*16 ### Sample Program: Convolution CODE EXAMPLE 6-10 uses CCNVCOR to perform FFT convolution of two complex vectors.    my_system% cat con_ex20.f   PROGRAM TEST  C   INTEGER LWORK   INTEGER N   PARAMETER (N = 3)   PARAMETER (LWORK = 4 * N + 15)   COMPLEX P1(N), P2(N), P3(2*N-1), WORK(LWORK)   DATA P1 / 1, 2, 3 /, P2 / 4, 5, 6 /  C   EXTERNAL CCNVCOR  C   PRINT *, 'P1:'   PRINT 1000, P1   PRINT *, 'P2:'   PRINT 1000, P2     CALL CCNVCOR ('V', 'T', N, P1, 1, 1, N, 0, 1, P2, 1, 1, 1,  $ 2 * N - 1, 1, P3, 1, 1, 1, WORK, LWORK)  C   PRINT *, 'P3:'   PRINT 1000, P3  C   1000 FORMAT (1X, 100(F4.1,' +',F4.1,'i '))  C   END  my_system% f95 -dalign con_ex20.f -xlic_lib=sunperf  my_system% a.out   P1:   1.0 + 0.0i 2.0 + 0.0i 3.0 + 0.0i   P2:   4.0 + 0.0i 5.0 + 0.0i 6.0 + 0.0i   P3:   4.0 + 0.0i 13.0 + 0.0i 28.0 + 0.0i 27.0 + 0.0i 18.0 + 0.0i    

If any vector overlaps a writable vector, either because of argument aliasing or ill-chosen values of the various INC arguments, the results are undefined and can vary from one run to the next.

The most common form of the computation, and the case that executes fastest, is applying a filter vector X to a series of vectors stored in the columns of Y with the result placed into the columns of Z. In that case, INCX = 1, INC1Y = 1, INC2Y NY, INC1Z = 1, INC2Z NZ. Another common form is applying a filter vector X to a series of vectors stored in the rows of Y and store the result in the row of Z, in which case INCX = 1, INC1Y NY, INC2Y = 1, INC1Z NZ, and INC2Z = 1.

Convolution can be used to compute the products of polynomials. CODE EXAMPLE 6-11 uses SCNVCOR to compute the product of 1 + 2x + 3x2 and 4 + 5x + 6x2.

   my_system% cat con_ex21.f   PROGRAM TEST   INTEGER LWORK, NX, NY, NZ   PARAMETER (NX = 3)   PARAMETER (NY = NX)   PARAMETER (NZ = 2*NY-1)   PARAMETER (LWORK = 4*NZ+32)   REAL X(NX), Y(NY), Z(NZ), WORK(LWORK)  C   DATA X / 1, 2, 3 /, Y / 4, 5, 6 /, WORK / LWORK*0 /  C   PRINT 1000, 'X'   PRINT 1010, X   PRINT 1000, 'Y'   PRINT 1010, Y   CALL SCNVCOR ('V', 'T', NX, X, 1, 1,   NY, 0, 1, Y, 1, 1, 1, NZ, 1, Z, 1, 1, 1, WORK, LWORK)   PRINT 1020, 'Z'   PRINT 1010, Z   1000 FORMAT (1X, 'Input vector ', A1)   1010 FORMAT (1X, 300F5.0)   1020 FORMAT (1X, 'Output vector ', A1)   END  my_system% f95 -dalign con_ex21.f -xlic_lib=sunperf  my_system% a.out   Input vector X   1. 2. 3.   Input vector Y   4. 5. 6.   Output vector Z   4. 13. 28. 27. 18.    Making the output vector longer than the input vectors, as in the example above, implicitly adds zeros to the end of the input. No zeros are actually required in any of the vectors, and none are used in the example, but the padding provided by the implied zeros has the effect of an end-off shift rather than an end-around shift of the input vectors. CODE EXAMPLE 6-12 will compute the product between the vector [ 1, 2, 3 ] and the circulant matrix defined by the initial column vector [ 4, 5, 6 ].    my_system% cat con_ex22.f   PROGRAM TEST  C   INTEGER LWORK, NX, NY, NZ   PARAMETER (NX = 3)   PARAMETER (NY = NX)   PARAMETER (NZ = NY)   PARAMETER (LWORK = 4*NZ+32)   REAL X(NX), Y(NY), Z(NZ), WORK(LWORK)  C   DATA X / 1, 2, 3 /, Y / 4, 5, 6 /, WORK / LWORK*0 /  C   PRINT 1000, 'X'   PRINT 1010, X   PRINT 1000, 'Y'   PRINT 1010, Y   CALL SCNVCOR ('V', 'T', NX, X, 1, 1,  NY, 0, 1, Y, 1, 1, 1, NZ, 1, Z, 1, 1, 1,   WORK, LWORK)   PRINT 1020, 'Z'   PRINT 1010, Z  C   1000 FORMAT (1X, 'Input vector ', A1)   1010 FORMAT (1X, 300F5.0)   1020 FORMAT (1X, 'Output vector ', A1)   END  my_system% f95 -dalign con_ex22.f -xlic_lib=sunperf  my_system% a.out   Input vector X   1. 2. 3.   Input vector Y   4. 5. 6.   Output vector Z   31. 31. 28.    The difference between this example and the previous example is that the length of the output vector is the same as the length of the input vectors, so there are no implied zeros on the end of the input vectors. With no implied zeros to shift into, the effect of an end-off shift from the previous example does not occur and the end-around shift results in a circulant matrix product.    my_system% cat con_ex23.f   PROGRAM TEST  C   INTEGER M, N   PARAMETER (M = 2)   PARAMETER (N = 3)  C   INTEGER I, J   COMPLEX P1(M,N), P2(M,N), P3(M,N)   DATA P1 / 1, -2, 3, -4, 5, -6 /, P2 / -1, 2, -3, 4, -5, 6 /   EXTERNAL CCNVCOR2  C   PRINT *, 'P1:'   PRINT 1000, ((P1(I,J), J = 1, N), I = 1, M)   PRINT *, 'P2:'   PRINT 1000, ((P2(I,J), J = 1, N), I = 1, M)  C   CALL CCNVCOR2 ('V', 'Direct', 'No Transpose X', 'No Overwrite X',   'No Transpose Y', 'No Overwrite Y', M, N, P1, M,   \$ M, N, 0, 0, P2, M, M, N, P3, M, 0, 0)  C   PRINT *, 'P3:'   PRINT 1000, ((P3(I,J), J = 1, N), I = 1, M)  C   1000 FORMAT (3(F5.1,' +',F5.1,'i '))  C   END  my_system% f95 -dalign con_ex23.f -xlic_lib=sunperf  my_system% a.out   P1:   1.0 + 0.0i 3.0 + 0.0i 5.0 + 0.0i   -2.0 + 0.0i -4.0 + 0.0i -6.0 + 0.0i   P2:   -1.0 + 0.0i -3.0 + 0.0i -5.0 + 0.0i   2.0 + 0.0i 4.0 + 0.0i 6.0 + 0.0i   P3:  -83.0 + 0.0i -83.0 + 0.0i -59.0 + 0.0i   80.0 + 0.0i 80.0 + 0.0i 56.0 + 0.0i    

## References

For additional information on the DFT or FFT, see the following sources.

Briggs, William L., and Henson, Van Emden. The DFT: An Owner's Manual for the Discrete Fourier Transform. Philadelphia, PA: SIAM, 1995.

Brigham, E. Oran. The Fast Fourier Transform and Its Applications. Upper Saddle River, NJ: Prentice Hall, 1988.

Chu, Eleanor, and George, Alan. Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms. Boca Raton, FL: CRC Press, 2000.

Press, William H., Teukolsky, Saul A., Vetterling, William T., and Flannery, Brian P. Numerical Recipes in C: The Art of Scientific Computing. 2 ed. Cambridge, United Kingdom: Cambridge University Press, 1992.

Ramirez, Robert W. The FFT: Fundamentals and Concepts. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1985.

Swartzrauber, Paul N. Vectorizing the FFTs. In Rodrigue, Garry ed. Parallel Computations. New York: Academic Press, Inc., 1982.

Strang, Gilbert. Linear Algebra and Its Applications. 3 ed. Orlando, FL: Harcourt Brace & Company, 1988.

Van Loan, Charles. Computational Frameworks for the Fast Fourier Transform. Philadelphia, PA: SIAM, 1992.

Walker, James S. Fast Fourier Transforms. Boca Raton, FL: CRC Press, 1991.

1 (TableFootnote) When the lengths of the two sequences to be convolved are similar, the FFT method is faster than the direct method. However, when one sequence is much larger than the other, such as when convolving a large time-series signal with a small filter, the direct method performs faster than the FFT-based method.
2 (TableFootnote) When the sizes of the two matrices to be convolved are similar, the FFT method is faster than the direct method. However, when one sequence is much larger than the other, such as when convolving a large data set with a small filter, the direct method performs faster than the FFT-based method.
3 (TableFootnote) Memory will be allocated within the routine if the workspace size, indicated by LWORK, is not large enough.