Spectral
calculations
Basic calculations:
- The FFT spectrum: simple FFT is applied to the
signal. Before the FFT calculation itself, the part of the signal
processed is completed with zeroes.
- cepstrally smoothed
spectrum:
- cepstrum (by FFT + modulus + log + inverse FFT)
- liftering (see below)
- final FFT.
- LPC spectrum: simple LPC followed by an FFT
which derives the spectrum on the basis of the LPC derived coefficients.
- selective LPC spectrum: calculates by LPC the
spectrum between two frequencies (denoted as lower and upper frequencies).
- true enveloppe: calculates an iterative
cepstrum so that the difference between the smoothed spectrum and the FFT
spectrum is minimized. For further details see: P. Halle,
“Techniques cepstrales améliorées pour
l'extraction d'enveloppe spectrale et la détection du pitch”, Actes du séminaire
“Traitement du signal de parole”, pp 83-93, Paris, 1983
- critical bands: calculates the filtering
realized by criticical bands (Bark) and mel filters with or without the
mel liftering. This simulates the front end of automatic speech
recognition and graphically renders the spectral information used by
automatic speech recognition.
Some definitions:
- FFT : Fast Fourier Transform. FFT is the
simplest method for calculating a spectrum.
- FFT window: this gives the order of the
FFT used. It is always a power of 2 (i.e., 64, 128,256, etc…).
- signal window: the number of signals
processed eventually completed with zeroes for FFT calculations. A Hamming
window is always applied to the signal window.
- pre-emphasis: to pre-emphasize the signal
s[i] means using s[i]-s[i-1] instead in the calculations which comes to
multiplying the energy of each frequency by a coefficient which is
proportional to the frequency. This is done before applying the Hamming
window. All WinSnoori calculations by default involve pre-emphasis.
- liftering: in cepstral smoothing, the
initial portion of the resulting cepstrum bears the acoustic
characteristics of the vocal tract which is followed by the voicing
characteristics (with a peak situated between 5 and 20 milliseconds). Liftering
preserves the cepstrum preceding the lower liftering boundary (around 1.25
ms), delete the part beyond the upper liftering boundary (about 2.5 ms)
and performs a progressive attenuation of the middle portion. The two
boundaries are referred to as the beginning and end of liftering
respectively.
- LPC:
Linear Predictive Coding.
- order of LPC: the number of coefficients to
be derived by the LPC.
- frequencies of the selective
LPC: These
represent the two frequencies (lower and upper) between which the spectrum
is calculated.