November 15, 2010
This page is not intended to present WinSnoori exhaustively but to demonstrate some of the facilities of WinSnoori which make its interest, or at least which can been exploited to solve some concrete problems. We will therefore present:
The directoy where WinSnoori is installed is the value of the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Wsno.
The initialization file Wsno.ini in the Windows directory allows you to adapt WinSnoori regarding messages, phonetic symbols and font and audio player.
Windows7: wsno.ini is in the virtual Windows folder named "AppData\Local\VirtualStore\Windows" in the user folder, i.e. C:\Users\laprie\AppData\Local\VirtualStore\Windows for me. This Windows 7 solution is intended to prevent users from accessing the true Windows folder.
You can launch WinSnoori
from a HTML page by adding the .wsn extension to any
valid sound file.
Here is an example.
More generally you can run WinSnoori with the following command :
Phonetic annotations belong to the set of phonetic symbols used by WinSnoori. It can be useful to choose another set of symbols, possibly not phonetic, to annotate some speech files. With this aim in view the user is given the possibility to change the set of phonetic symbols as well as the font.
There are already several sets of symbols in "c:\Program Files\Wsno\format" :
These files contain the list of symbols, the hierarchical phonetic menu, the phonetic classes, the hierarchical menu for classes (this menu is used during exploration of annotated corpora) and the first two formant frequencies for vowels (the frequency are used to plot formants in F1-F2 plane). You can create a new file following the pattern indicated in these files. For incorporating symbols not belonging to the default font use "charmap" with the font you want to use ("IPAPhon" font, for instance), select symbols with "charmap" and paste them in the new file. Note that these characters do not appear correctly in the new file if you are using "notepad" to create the new phonetic file. Anyway, WinSnoori will display correct symbols. Note that the way you organize phonetic symbols into phonetic classes influence the automatic exploration of annotated corpora. You have to pay attention to define classes which take into account the manner and the point of articulation of sounds.
Once this file is created,
modify the file "c:\Winnt\wsno.ini" (or in "windows" or
"windows95") accordingly. Assuming that the new file is xxxipa.phon replace the old line "LanguageFile=old.pho"
by "LanguageFile=xxxipa.pho". Do not forget
to modify the font so that symbols appear correctly in WinSnoori.
The first step is to check that the font is installed.
If it is not the case (especially for IPAPhon or IPAKiel font) install the font on your PC. Then, you can
modify the "wsno.ini" file by specifying the new phonetic font.
Assuming that the new font is "IPAPhon"
replace the old line "PhoneticFont=Arial"
Time and frequency coordinates pointed by
the mouse are indicated in the welcome window. Once the F0 has been calculated
F0 is also displayed. Besides the coordinates the duration of the highlighted
region is given in ms and in s-1. That information can be used to
evaluate F0 by hand, for instance.
Suppose you have zoomed on the region displayed in
this spectrogram and you want to evaluate F0. You simply select a region
which obviously corresponds to the period of F0. When you point the mouse in
this region you obtain :
The reassignment algorithm proposed by F. Auger and P. Flandrin, ("Improving the readibility of timefrequency and time-scale representations by the reassignment method," IEEE Transactions on Signal Processing, vol. 43, no. 5, pp. 1068--1089, 1995) and used by Plante et al. in speech processing (see the paper of Plante et al. for further details (F. Plante and G. Meyer and W.A. Ainsworth", "Improvement of speech spectrogram accuracy by the method of reassignment", "IEEE Transactions on Speech and Audio Processing", 6(3), pp 282-287, 1998)) enables sharper resolution in time or frequency. If the window is small (4ms) the time resolution is increased and enables the detection of glottal closure instants. If the window is longer (32 ms for instance) very fine harmonics are obtained. The first image is a narrow band spectrogram of a female voice, the second is the corresponding reassigned spectrogram.
The "Critical bands" option in the spectrogram menu and the corresponding option spectral slice allows you to get a better idea of what spectral data automatic speech recognition systems use. The next image shows the spectrogram of a female voice processed by mel filters displayed in the mel frequency scale. Note that the DCT has not been performed. This explains why first harmonics remains. On the contrary, there is a strong smoothing in high frequencies and their contribution is weaker than with a linear frequency scale.
The fllowing images shows the spectral slice which has been obtained with mel cepstral smoothing. Unlike standard mel cepstral analysis an IDCT-III transform has been applied to mel cepstra to recover a spectrum. Two extra triangular filters (one at zeo Hz and one at Fs/2) have been added to keep the energy of the spectrum. This image correspond to standard parameters used in automatic speech recognition, i.e. 24 filters, 12 coefficients.
The editor of WinSnoori is not intended to make modifications on large portions of signal since the window is limited but to edit signal by taking into account the acoustical effects of the editing commands. We describe two situations :
Note: The editor of WinSnoori
can handle files of any size. Nevertheless the duration of the window is
limited to 4 seconds in order to minimize the memory required by WinSnoori. This means that if you want to perform editing
commands on large portions of a file you may be obliged to iterate your command
on small portions of the file to be modified.
When you cut a region you run the risk of breaking the periodic structure of speech which gives rise to discontinuities. These discontinuities correspond to spectral bars in the spectrogram and "clicks" when your are listening to the signal. The following figure exhibits such a discontinuity. You can listen to this discontinuity here.
You can suppress or at least reduce the influence of this discontinuity by restoring the periodic structure of speech. First, zoom in the signal in order to localize the discontinuity (at the center of the green circle). Choose a reasonable zoom so that the periodic structure does appear.
Then cut a portion of signal so that an artificial periodic structure is restored. Recompute the spectrogram if need be.
This figure shows the final signal. Here is the new signal without "click".
You can also
create artificial bursts when you cut a signal portion at a voicing
onset. In this case you can remove this artificial burst be
"damping" the signal. Use "damp left" when the burst
appears at the voice onset or the "damp right" when the burst
appears at the end of a voiced region.
The "Editing tools" window enables attenuations (left, right, middle), the addition of noise, filtering through FIR and OLA filters. You have to choose among these several possibilities. Here, the OLA filtering has been selected. As it is shown below its enables filtering along time x frequency trajectories. This window can be called from the "Edit" menu.
Suppose you want to investigate the
importance of some cue (burst or formant trajectory for instance). One solution
could be to remove this cue from signal by filtering.
Here is an example on the file c:\Program Files\Wsno\Examples\English\404.wav. Suppose we want to remove F2 information in the sound "ah" of language. Call "FIR Filtering" from the Editing tools window (edit menu), select "Stop band" instead of "Pass band", set the filter order to 61 and the attenuation to 35 dB. Then, define the region to be filtered by dragging the left button and click the "Apply" button. As filtering has may create a small discontinuity you probably have to remove it (it is the case with this example). Here are the original and the final spectrograms.
Original (soundand spectrogram)
Suppose you want
to lower, remove, raise or keep only formants or harmonics to modify speech.
One solution is to use the the OLA filtering
from the Editing tools (Edit menu).
Here, we present one example with formants and another with hamonics.
The file is c:\Program Files\Wsno\Examples\English\307.wav and modifications are applied to the beginning of the file. The first step is to draw formants and to lock them on LPC roots. The locking algorithm is very simple. It search for a linear prediction root close to the frequency of the current point of the filter trajectory. If there is no root the bandwith of which is smaller than 700 Hz and closer than 500 Hz the trajectory points are not corrected. This explains jumps when energy is weak or when there are conflicting linear prediction roots. Once locked on linear prediction roots it can be useful to correct some points. For that purpose Ctrl+click on trajectory to be corrected, then move any point by keeping Ctrl+left button of the mouse pressed.
This figure shows the trajectories drawn for the first three formants F1, F2 and F3 after they have been locked on linear prediction roots. The width has been set to 800 Hz and the gain to -48 dB. These parameters are adapted to the filtering of formants.This is the original signal.
Here is the result when the "Pass" option has been chosen. Only the contributions of formants has been kept. This is the corresponding signal.The "Pass" option means that the gain factor applies everywhere but on the width corresponding to the filter trajectories. In case of conflict betwwen gains, the highest gain in modulus is accepted.
Here is the result when the
"Stop" option has been selected. The contributions of formants F1,
F2 and F3 have been removed. This is the
corresponding signal.Here the gain factors were -48
dB. The effect of filtering can also be evaluated by displaying a narrow band
Harmonics can be
edited in the same manner as that presented above. The only difference concerns
the locking of trajectories on harmonics rather than linear prediction roots.
As drawing filter trajectories superimposed onto harmonics is difficult, the
highest frequency of the spectrogram can be changed ("Upper disp.
frequency" in the "Options menu" of "Spectro").
When the highest frequency of the spectrogram is lowered it is worth changing
the order of the Fourier transform to obtain a better frequency resolution.
This figure shows the three filter trajectories drawn on the narrow band spectrogram for the first second of the file c:\Program Files\Wsno\Examples\English\307.wav. Trajectories have been locked on harmonics and slightly corrected by hand. Here is the original signal.
Using Wav files is
probably the easiest way of organizing speech signals and annotations. When a
Wav file is open WinSnoori saves phonetic and
orthographic annotations in "chunks" (see Multimedia documentation
for further details). The chunk which contains phonetic information is called
"phon", and that containing orthographic
information is called "word".
In each chunk the annotations are organized as follows :
left boundary in samples and text (phoneme or words).
The next left boundary is the right boundary for the current annotation.
The program chunk.cpp in C:\Program Files\Wsno\Chunk shows how chunks can be read. You can type "chunk.exe <wavfile>" to display chunks created by WinSnoori.
The automatic exploration of annotated corpora allows you to extract all the occurrences of one (or several) sequence of phonemes out of the files. You can specify the sequences of phonemes (up to 5 sequences). Phonemes and phonetic classes are accessed by clicking buttons.
The corpus and the annotation directory are set by clicking the "Modify domain" button. This dialogue window allows the audio files as well as the annotation files to be specified. Actually, the dialogue has a more general purpose which consists in specifying the format of files read by WinSnoori. In this case only "speech file" and "phoneme file" is used. The file names must be understood as examples and not as the only files which are explored. The file names are used by WinSnoori to derive the directory and the extensions. When using wav files containing both the speech signal and annotations you have to specify the wav file as "speech file" and as "phoneme file" (in this particular case the "Format" is useless).
Once phoneme sequences and corpus are set, clicking the "ok" button calls the exploration which produces a speech file with phonetic annotations and a text file showing all the occurrences found. The speech file is automatically opened (its path is "c:\Program Files\Wsno\temp\srchphon.wa$"). This speech file is here and the text file showing all the occurrences (its path is "c:\Program Files\Wsno\temp\results.txt") is here .
indicate the directory where to search and the type of files. In this example
speech files are the same as annotation files since files are Wav files
containing chunks for annotations.
This interface provides you with a complete toolbox for editing files of parameters. The synthesizer is independent of WinSnoori and therefore can be replaced by any formant synthesizer provided that the parameters are the same. The current synthesizer is the GPL version developed by Jon Iles and Nick Ing-Simmons (The complete sources of the synthesizer are in Wsno\Klatt.). Note that we added some parameters: a time stamp at the beginning of each frame, extra parameters for the LF source, and extra formants to represent peaks that cannot be caught with F1 to F6.
You can draw parameter trajectories by using the left
button while maintaining the key <Shift> pressed. Once a trajectory is
created you can put it close to the nearest spectrogram peaks by "Registering"
it with linear prediction or cepstral smoothing. This means that each point of
the trajectory is moved towards the nearest linear prediction root (resp. the
nearest peak of the cepstrally smoothed spectrum)
provided that the root (resp. the peak) is acceptably close to the initial
This allows you to draw rough trajectories and then to correct them according to the real spectrogram. You can also smooth trajectories with a B-spline algorithm. Note that registering is not meaningful in regions where the energy is very weak.
Trajectories are saved in text form. Each line gives parameters for the formant synthesizer (see c:\\Program Files\Wsno\Klatt\klatt.html for a complete description of the synthesizer and parameters.). The old format is still available for compatibility sake.
You can now investigate acoustic consequences of various transformations on formant parameters (removing a formant, changing amplitudes...). Here is a longer example copy for the French story "La bise et le soleil..." and the synthesized waveform. Note that these two examples were not corrected by hand after copy synthesis. It is more reasonable to edit by hand amplitude to generate smoother amplitude trajectories.
You can modify the
speech rate and the F0 contour of speech signals. The menu "Time scale
modification" allows you to modify the speech
rate and the F0 level for the whole sentence. Parameters greater than one
decrease the speech rate and/or the F0 level. With the following choice :
and this original sentence the speech is sped up and F0 is lowered (result).
Here is another example for a female voice with the same modification parameters : original and modified. Note that the original signal is slightly noisy. More interestingly for oral comprehension speech can be slowed down : original and slowed down (the F0 level is kept unchanged and the time factor is set to 2).
You can also
modify the F0 contour with the "F0 contour modifications"
function. First you have to import the F0 contour by reading it from a file or
by calculating it for the current signal. Note that even if only a part of the
signal is displayed in the window the whole signal is processed. The following
figure exhibits the original F0 contour superimposed onto the spectrogram for
the sentence: "Another experiment required subjects to read lists of
Second, you have to display F0 "to modify it". You can draw a new F0 contour by using the left button while maintaining the key <Shift> pressed. The new F0 contour kept the voiced-unvoiced nature of speech. If you move in the signal you have to "display FO to modify it" again so that the F0 contour becomes visible. The following figure exhibits the new F0 contour.
can re-synthesize the sentence with this new contour. Here is the
original sentence, here is the
re-synthesized signal. Note that we also changed the F0 contour at the end of
the sentence !
When F0 values
calculated automatically will be used by another program it can be useful to
correct them to avoid any problem. You can use the "F0 contour
modifications" functions for that purpose by following exactly the same
strategy as that described just above (import the F0 contour from a file,
display the contour to modify it, and finally save this contour). You can
substantially simplify this task by choosing an upper display frequency ("Options"
in the "Spectro" menu) that corresponds to
a multiple of the F0 scale.
For "Female" and "Any" speaker the highest F0 value is 600 Hz and 500 Hz for "Male" speakers. Therefore, by choosing 1800 Hz as the upper display frequency for female speakers, you just have to follow the third harmonic (1800/600). In the same way by choosing 2000 Hz as the upper frequency for male speakers, you just have to follow the forth (2000/500) harmonic.
Do not forget that you can also modify the voicing nature of points by moving them individually with the left button of the mouse while maintaining the key CTRL pressed.
It can be useful
to exploit results of WinSnoori with other software.
Most of the results of WinSnoori, therefore, are put in text form and can be added to the log file
(some results which could produce huge files of little interest, spectrograms
for instance, are not saved). The functions "show results", "append results to log" and "show
log" in the "Miscellaneous" allow respectively to
display the text file for the very last command, to add this file to the log
file and to display the log file.
When you drag the mouse to calculate and display a spectral slice all the spectra are saved in text form (use "show results" to see these values). This file contains all the spectra; it can be therefore more convenient to display only one spectrum by turning off the colored buttons of the spectra to be not displayed. Then you can analyze results in another software.