This
spectrum is calculated by applying a critical band analysis. The filters are
either Bark (see H. Hermansky, "Perceptual
linear predictive (PLP) analysis of speech", JASA, Vol. 87 1990) or Mel. The
dialog window enables the size of the signal window, the FFT order, the number
of bands, and their distribution to be adjusted.
After the application
of the critical band or Mel filters the spectral the user can apply a linear
prediction to get a PLP spectrum, or a liftering to
get a Mel cepstrally smoothed spectrum, or nothing.
In the latter case the spectrum displayed is only the result of the critical
band filtering.
In the case
of the cepstral smoothing, a prior linear cepstral smoothing (before the application of critical band
filters to the spectrum) can be applied in order to remove the effect of the
fundamental frequency.
When the cepstrum liftering
is applied, WinSnoori displays the Mel cepstrally smoothed spectrum and not the Mel cepstral coefficients. This spectrum thus is not the data
fed into an automatic speech recognition system but gives a correct idea of the
spectral information used in ASR.
When the PLP is used the user can set the prediction order and activate the
intensity conversion, i.e. the application of .1/3 after the
application of the critical band filters.
The
distribution of bands can be linear in Hz "linear in frequency" to
display a spectrum or a spectrogram or linear in Bark or Mel (called "ASR
like"). The first choice should be used only with a reasonably high number
of points because too small a number of points gives rise to a very poor
representation in low frequencies and consequently display artefacts.
The dialog window checks the overall consistency of parameters.
Example: Here is the dialog window to apply a Mel cepstrum
analysis with 24 bands and 12 Mel coefficients. The spectrum displayed is not
the vector of the Mel coefficients but the Mel cepstrally
smoothed spectrum, i.e. a IDCT has been applied after
the DCT of the Mel cepstrum computation.