FFT – Fast Fourier Transform

Good resources

FFT Software Libraries

Kiss FFT

Using FFT For An Audio Spectrum Analyzer

Special terms:

'Nyquist frequency' = half of the sampling rate of a discrete signal processing system

'Bin width' = the width of each sample position in Hz

Basic FFT Decisions

The sample rate, which determines the "Nyquist frequency",

The number of samples per frame, which determines the frame size or "bin width".

If we have a bin width of 43 Hz (which will be a result of dividing Nyquist frequency by the FFT frame size), then we have bins from 0 Hz to 43 Hz, 43 Hz to 86 Hz, 86 Hz to 129 Hz, and so on. The problem with this from an audio point of view is that the human ear responds to frequency logarithmically, not linearly. At low frequencies, 43 Hz is quite a wide interval (the jump from 43 Hz to 86 Hz is a whole octave), but at higher frequencies, 43 Hz is a tiny interval (perceptually, less than a minor second). So the FFT has very fine high-frequency pitch resolution, but very poor low-frequency resolution.

The design trade off decision for an audio FFT

A trade off that has to be made between frequency and time resolution. The more accurately you measure the frequency content of a signal, the more samples you have to analyze in each frame of the FFT. However the cost of expanding the frame size is that the larger the frame the less you know about the temporal events which take place within that frame.

A good audio approach

1,024 samples is common frame size for an audio FFT at a sample rate of 44.1 kHz. This gives a 43 Hz frequency resolution.

At a sample rate of 44.1 kHz, 1024 samples is about 0.023 seconds of audio. All of the audio which takes place within that 0.023 seconds will be lumped together and analyzed as one event. Because of the nature of the FFT, this "event" is actually treated as if it were an infinitely repeating periodic waveform. The amplitudes of the frequency components of all the sonic events in that time frame will be averaged, and these averages will end up in the frequency bins.

If you need more than the 43 Hz frequency resolution that a 1k FFT gives you need to use a bigger frame size. However a bigger frame size means that even more samples will be lumped together giving you a worse time resolution. With the next frame size up of 2048 samples you get a frequency resolution of about 21.5 Hz, but with a time resolution of approx. 0.05 (1/20) of a second. That may still seem fast but in audio a lot can happen in 1/20 of a second.

if you construct an FFT with a timeSize of 1024 and and a sampleRate of 44100Hz the spectrum will contain values for frequencies below 22010 Hz, which is the Nyquist frequency (half the sample rate). If you read the value of band 5 it will correspond to a frequency band centered on 5/1024 * 44100 = 0.0048828125 * 44100 = 215 Hz. The width of that frequency band is equal to 2/1024, expressed as a fraction of the total bandwidth of the spectrum. The total bandwith of the spectrum is equal to the Nyquist frequency, which in this case is 22100, so the bandwidth is equal to about 50 Hz. Therefore much of the FFT data is effectively wasted on recording high-frequency information very accurately, at the expense of the low frequency information which is generally more useful in a musical context.

Feel free to comment if you can add help to this page or point out issues and solutions you have found. I do not provide support on this site, if you need help with a problem head over to stack overflow.

Programming