## 1. Introduction

The discrete Fourier transform (DFT) of a discrete signal comprising N samples is: Sn = 1N∑k = 0N-1ukexp-j2πnkN (1)

For the application of this transformation to the case of periodic functions, see Discrete Fourier transform: Fourier series. The application to any signal is explained in Discrete Fourier transform: Fourier transform.

This document presents the fast discrete Fourier transform (FFT) algorithm then a prototype implementation in python. Finally, an implementation for two-dimensional FFT is proposed for a graphics processor, with the OpenGL programming interface.

## 2. FFT algorithm

This involves calculating the following N sums (n varying from 0 to N-1): Fn = ∑k = 0N-1ukexp-j2πnkN = ∑k = 0N-1ukWNnk (2)

The N uk samples are complex and of course the sum is also complex. The direct computation of these sums has a complexity in N2, very penalizing when N is large. The fast discrete Fourier transform algorithm is based on the decomposition of the previous sum obtained by grouping the even uk terms and the odd terms. It requires a number of power samples of two N = 2q. Here is the decomposition: Fn = ∑k = 0N2-1u2kexp-j2πn2kN + exp-j2πnN∑k = 0N2-1u2k + 1exp-j2πn2kN (3) = Fnp + WNnFni (4)

Fnp is the DFT of N / 2 even terms. Its period is N / 2. Likewise, Fni is the TFD of odd terms. As N2 = 2q-1 is divisible by two, each of these TFD is in turn decomposed into TFD of even and odd terms: Fnp = Fnpp + WN / 2nFnpi (5) Fni = Fnip + WN / 2nFnii (6)

This decomposition is carried out recursively until one-term DFTs are obtained. The TFD of uk is simply uk. We will have, for example for q = 4 and for the ipii succession: Fnipii = um (7)

where m is an index between 0 and N-1. To determine the value of this index, we notice that the separation in even and odd terms in equation (4) amounts to testing the least significant bit (bit 0) of k. If this bit is 0, we are in the even group. In the second decomposition, sorting is done according to bit 1 of k, and so on. For example, for the sequence ipii these bits are 1011. By inverting the order of the bits, we therefore obtain the binary representation of m, that is here 1101.

To calculate the DFT recursively (from top to bottom, starting from the N-term DFT), it would be necessary at each step to reorganize the even and odd terms into separate tables, before applying the lower-ranked DFT to them. It is much more efficient to start with the TFD at the end. To do this, we must start by arranging the samples uk in order of their index obtained by inverting the order of the bits. For example, the term 5 of binary representation 0101 will be ordered to the rank of binary representation 1010, that is to say 10.

Note xh list of samples and stored. The index h a binary representation for the ipii succession (not inverted).

The first step is to obtain the two-term DFTs. To return to the previous example, we must calculate for the index m the terms Fnipi with n = 0 and n = 1 (the two-term DFT is of period 2). The index in the initial list has the binary representation of ipii (after sorting with inversion of the order of the bits). The index h is therefore odd. The TFD has two terms is therefore obtained by: Fnipi = Fnipip + W2nFnipii = xh-1 + W2nxh (8)

There are two values of n to apply (0 and 1). More generally, for each even index h, let [h / 2] be the binary representation of h / 2 (obtained by removing the last bit). We calculate: Fn [h / 2] = xh + W2nxh + 1 (n = 0.1) (9)

For this first step, the values in the list are therefore modified as follows: xh ← xh + W20xh + 1 (10) xh + 1 ← xh + W21xh + 1 (11)

The second step consists in calculating the 4-term DFTs. For that, we apply the following transformation to each packet of 4 consecutive terms: xh ← xh + W40xh + 2 (12) xh + 1 ← xh + 1 + W41xh + 3 (13) xh + 2 ← xh + W42xh + 2 ( 14) xh + 3 ← xh + 1 + W43xh + 3 (15)

Note that this transformation requires an intermediate storage array.

We then proceed to the calculation of the DFT with 23 = 8 terms, and so on until the DFT with N terms.

Finally, let’s see how step e is carried out, which consists of going from 2nd-1 term DFTs to 2nd term DFTs. The first step is e = 1, the last e = q (N = 2q). Let n (e) = 2e and n (e-1) = 2e-1. Each packet of n (e) consecutive terms of the list xk must undergo the same computation. For the first half of each packet, the transformation is: xh + k ← xh + k + Wn (e) kxh + k + n (e-1) (k = 0 ⋯ n (e-1) -1) (16 )

For the second half, the transformation is: xh + k ← xh + k-n (e-1) + Wn (e) kxh + k (k = n (e-1) ⋯ n (e) -1) (17)

The following figure shows how the indices are combined in the case n (e) = 8. The top line represents two 4-term DFTs, the bottom line an 8-term DFT. The index h is a multiple of n (e).

## 3. python implementation

The different stages of the FFT are easier to implement if there are two tables, one for input, one for output. For implementation on GPU (see below), this condition is imperative. By convention, it should be noted x the input table and the output table there.

import math import numpy

The first step consists in arranging the elements in reverse order of the index bits. The following function performs the inversion of bits for an index. The number of bits q must also be provided.

def inversionBits(q,k): m = 0 i = k for b in range(q): m = m|(i&1) m = m<<1 i = i>>1 m = m>>1 return m

Example :

print(inversionBits(4,5)) --> 10

def rangerInversionBits(q,x,y): N = 2**q for k in range(N): m = inversionBits(q,k) y[m] = x[k]

x = numpy.array([0,1,2,3]) y = numpy.array([0,0,0,0]) rangerInversionBits(2,x,y)

To apply the next step, you must exchange the table references :

z = x x = y y = z

print(x[0]) --> 0

print(x[1]) --> 2

print(x[2]) --> 1

print(x[3]) --> 3

Let’s see the writing of the function which performs step e of the FFT. We must calculate the following n (e) coefficients: Wn (e) k = exp-j2πkn (e) = cos2πkn (e) + jsin2πkn (e) (18)

This involves calculating a cosine and a sine for k varying from 0 to n (e) -1. Note however that: Wn (e) 0 = 1 (19) Wn (e) n (e) / 2 = -1 (20) Wn (e) n (e) -k = cos2πkn (e) -jsin2πkn (e ) (21)

So there is in fact n (e) / 2-1 cosine and sine to calculate.

The following function performs step e. The main loop is done on the index k. The cases k = 0 and k = n (e-1) are treated separately.

def etapeFFT(x,y,q,e): ne = 2**e nem1 = ne/2 for k in range(1,nem1): phi = 2*math.pi*k/ne W = math.cos(phi)+math.sin(phi)*1j for i in range(2**(q-e)): # loop on packets of no terms h = i*ne # first index of the packet y[h+k] = x[h+k]+W*x[h+k+nem1] y[h+ne-k] = x[h+ne-k-nem1]+W.conjugate()*x[h+ne-k] for i in range(2**(q-e)): h = i*ne y[h] = x[h]+x[h+nem1] y[h+nem1] = x[h]-x[h+nem1]

The following function computes the discrete Fourier transform.

def fft1(u,q): x = numpy.array(u,dtype=complex) y = numpy.zeros(2**q,dtype=complex) rangerInversionBits(q,x,y) z = x x = y y = z for e in range(1,q+1): etapeFFT(x,y,q,e) z = x x = y y = z return x

Example: trigonometric function sampled over a period.

q = 4 N = 2**q u = numpy.zeros(N,dtype=complex) i = numpy.zeros(N,dtype=int) for k in range(N): i[k] = k u[k] = math.sin(2*math.pi*k/N)+0.5*math.sin(4*math.pi*k/N)+0.25*math.cos(10*math.pi*k/N) tfd = fft1(u,q)

from pylab import * spectre = numpy.absolute(tfd) figure(figsize=(10,4)) stem(i,spectre,'r')

## 4. Two-dimensional FFT for GPUs

### 4.a. Prototype

The two-dimensional FFT makes it possible to calculate the Fourier transform of an image. The DFT of an image is obtained by first performing the DFT on its rows, then by applying the DFT on the columns. An implementation of two-dimensional FFT on a graphics processor (GPU) makes it possible to perform the Fourier transform of images very quickly.

The graphics processor performs parallel calculations on computing units. Whenever possible, calculations on different units should be independent of each other. We choose a distribution of tasks consisting in assigning an index of the table to be processed (at each step) per unit of calculation. This is not the optimal solution, but it has the advantage of being easy to implement with the OpenGL programming interface. For a more efficient distribution of tasks, it is necessary to use OpenCL or CUDA.

Let us first see a prototype in python of the function which will be executed by each calculation unit. Each calculation unit must calculate the trigonometric term W corresponding to its index.

def etapeFFT_unite(x,y,q,e,indice): ne = 2**e nem1 = ne/2 k = indice%ne # indice local dans le paquet de ne termes phi = 2*math.pi*k/ne W = math.cos(phi)+math.sin(phi)*1j a = k/nem1 # premiere ou seconde moitie du paquet if a==0: y[indice] = x[indice]+W*x[indice+nem1] else: y[indice] = x[indice-nem1]+W*x[indice]

Here is the fft function which uses this method:

def fft2(u,q): N = 2**q x = numpy.array(u,dtype=complex) y = numpy.zeros(2**q,dtype=complex) rangerInversionBits(q,x,y) z = x x = y y = z for e in range(1,q+1): for indice in range(N): etapeFFT_unite(x,y,q,e,indice) z = x x = y y = z return x

Example :

tfd = fft2(u,q) spectre = numpy.absolute(tfd) figure(figsize=(10,4)) stem(i,spectre,'r')

### 4.b. Shaders openGL ES

With the programming interface OpenGL, the graphics processor is programmed using shaders. Here we consider the language OpenGL ES Shading Language (used in WebGL and on mobile devices).

The FFT is performed on a floating point texture, in several steps. Each step consists of modifying all the points of the texture in parallel. The texture should be square, with a size to the power of 2.

The Vertex Shader is responsible for transmitting the positions of the vertices. We simply draw a square made up of two triangles.

attribute vec2 aVertexPosition; void main(void){ gl_Position = vec4(aVertexPosition,0.0,1.0); }

The first step is to keep only layer 0 of the texture (the red layer) and cancel layer 1 (green layer). The first will be used to store the real part, the second to store the imaginary part. Here is the Fragment Shader which performs this operation.

precision mediump float; uniform sampler2D uSampler; uniform float uRowSize; void main() { float x = gl_FragCoord.x/uRowSize; float y = gl_FragCoord.y/uRowSize;, vec4 v = texture2D(uSampler,vec2(x,y)); gl_FragColor = vec4(v.r,0.0,0.0,1.0); }

The second step is to sort the rows by reversing the bits of the indices. The number of bits of the size of the texture must be hard-coded (9 in the example below, for a texture of size 512).

precision mediump float; uniform sampler2D uSampler; uniform float uRowSize; void main() { float index=0.0; float i = gl_FragCoord.x; float j = gl_FragCoord.y; float i2,b; float s = uRowSize/2.0; for (int k=0; k<9; k++) { // nombre de bits à coder en dur i2 = floor(i/2.0); b = i-i2*2.0; if (b>0.0) index += s; s /= 2.0; i = i2; } float x = index/uRowSize; float y = j/uRowSize; vec4 v = texture2D(uSampler,vec2(x,y)); gl_FragColor = vec4(v.r,v.g,0.0,1.0); }

A step of the FFT on the lines is carried out by the following Fragment Shader. You must pass the number n (e) and the number n (e-1) to the shader.

precision mediump float; uniform sampler2D uSampler; uniform float uRowSize; uniform float uNe; uniform float uNem1; uniform float uF; uniform float uNorm; float i = gl_FragCoord.x; float j = gl_FragCoord.y; void main(){ float k = mod(i,float(uNe)); float phi = uF*k; float cos = cos(phi); float sin = sin(phi); int a = int(k)/int(uNem1); vec4 v; float re1,im1,re2,im2; if (a==0) { v = texture2D(uSampler,vec2(i/uRowSize,j/uRowSize)); re1 = v.r; im1 = v.g; v = texture2D(uSampler,vec2((i+uNem1)/uRowSize,j/uRowSize)); re2 = v.r; im2 = v.g; gl_FragColor = vec4((re1+re2*cos-im2*sin)*uNorm,(re2*sin+im2*cos+im1)*uNorm,0.0,1.0); }, else {, v = texture2D(uSampler,vec2((i-uNem1)/uRowSize,j/uRowSize)); re1 = v.r; im1 = v.g; v = texture2D(uSampler,vec2(i/uRowSize,j/uRowSize)); re2 = v.r; im2 = v.g; gl_FragColor = vec4((re1+re2*cos-im2*sin)*uNorm,(re2*sin+im2*cos+im1)*uNorm,0.0,1.0); } }

For a texture 512 pixels wide, the steps must be applied with n (e) varying from 1 to 9. Once the FFT has been performed on the rows, the columns must be sorted by inverting the bits of the indices then apply the steps of the FFT on the columns. The corresponding shaders are analogous to the previous ones.

To display the spectrum, it may be necessary to order the DFT so as to place the zero frequency in the center of the image. This is done by the following shader, to which we must provide the size of the texture and its half:

precision mediump float; uniform sampler2D uSampler; uniform float uRowSize; uniform float uRowSized2; void main(){ float i = gl_FragCoord.x-0.5; float j = gl_FragCoord.y-0.5; if (i<=uRowSized2) i = uRowSized2-1.0+i; else i = -uRowSized2-1.0+i; if (j<=uRowSized2) j = uRowSized2-1.0+j; else j = -uRowSized2-1.0+j; gl_FragColor = vec4(texture2D(uSampler,vec2((i+0.5)/uRowSize,(j+0.5)/uRowSize))); }

To perform the reverse transformation:

precision mediump float; uniform sampler2D uSampler; uniform float uRowSize; uniform float uRowSized2; void main(){ float i = gl_FragCoord.x-0.5; float j = gl_FragCoord.y-0.5; if (i<uRowSized2-1.0) i = uRowSized2+1.0+i; else i = -uRowSized2+1.0+i; if (j<uRowSized2-1.0) j = uRowSized2+1.0+j; else j = -uRowSized2+1.0+j; gl_FragColor = vec4(texture2D(uSampler,vec2((i+0.5)/uRowSize,(j+0.5)/uRowSize))); }