A brief description of WinSnoori

Yves Laprie

November 15, 2010







This page is not intended to present WinSnoori exhaustively but to demonstrate some of the facilities of WinSnoori which make its interest, or at least which can been exploited to solve some concrete problems. We will therefore present:

Changing the initialization file

The directoy where WinSnoori is installed is the value of the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Wsno.

The initialization file Wsno.ini in the Windows directory allows you to adapt WinSnoori regarding messages, phonetic symbols and font and audio player.

Windows7: wsno.ini is in the virtual Windows folder named "AppData\Local\VirtualStore\Windows" in the user folder, i.e. C:\Users\laprie\AppData\Local\VirtualStore\Windows for me. This Windows 7 solution is intended to prevent users from accessing the true Windows folder.

Using WinSnoori in an Html page

You can launch WinSnoori from a HTML page by adding the .wsn extension to any valid sound file.
Here is an example.
More generally you can run WinSnoori with the following command :
wsno <file>
 

Adapting the set of phonemes

Phonetic annotations belong to the set of phonetic symbols used by WinSnoori. It can be useful to choose another set of symbols, possibly not phonetic, to annotate some speech files. With this aim in view the user is given the possibility to change the set of phonetic symbols as well as the font.

Changing the set of symbols

There are already several sets of symbols in "c:\Program Files\Wsno\format" :

These files contain the list of symbols,  the hierarchical phonetic menu, the phonetic classes, the hierarchical menu for classes (this menu is used during exploration of annotated corpora) and the first two formant frequencies for vowels (the frequency are used to plot formants in F1-F2 plane). You can create a new file following the pattern indicated in these files. For incorporating symbols not belonging to the default font use "charmap" with the font you want to use ("IPAPhon" font, for instance), select symbols with "charmap" and  paste them in the new file. Note that these characters do not appear correctly in the new file if you are using "notepad" to create the new phonetic file. Anyway, WinSnoori will display correct symbols.  Note that the way you organize phonetic symbols into phonetic classes influence the automatic exploration of annotated corpora. You have to pay attention to define classes which take into account the manner and the point of articulation of sounds.

Once this file is created, modify the file "c:\Winnt\wsno.ini" (or in "windows" or "windows95") accordingly. Assuming that the new file is xxxipa.phon replace the old line "LanguageFile=old.pho" by "LanguageFile=xxxipa.pho". Do not forget to modify the font so that symbols appear correctly in WinSnoori.
 

Changing the phonetic font

The first step is to check that the font is installed. If it is not the case (especially for IPAPhon or IPAKiel font) install the font on your PC. Then, you can modify the "wsno.ini" file by specifying the new phonetic font. Assuming that the new font is "IPAPhon" replace the old line "PhoneticFont=Arial" by "PhoneticFont=IPAPhon".
 

How to measure time and frequency on the spectrogram

Time and frequency coordinates pointed by the mouse are indicated in the welcome window. Once the F0 has been calculated F0 is also displayed. Besides the coordinates the duration of the highlighted region is given in ms and in s-1. That information can be used to evaluate F0 by hand, for instance.
 

Suppose you have zoomed on the region displayed in this spectrogram and you want to evaluate F0. You simply select a region which obviously corresponds to the period of F0. When you point the mouse in this region you obtain :
517 ms [11407]  -> time coordinate where the mouse points
f0 = 253 Hz        -> F0 extracted automatically
4ms                    -> duration of the highlighted region
1/251.3 sec        ->  Frequency (s-1 or Hz) corresponding to that region.
The two values 253 Hz and 251.3 Hz are in good agreement.


 
 

How to get sharper harmonics in spectrogram

The reassignment algorithm proposed by F. Auger and P. Flandrin, ("Improving the readibility of timefrequency and time-scale representations by the reassignment method," IEEE Transactions on Signal Processing, vol. 43, no. 5, pp. 1068--1089, 1995) and used by Plante et al. in speech processing (see the paper of Plante et al. for further details (F. Plante and G. Meyer and W.A. Ainsworth", "Improvement of speech spectrogram accuracy by the method of reassignment", "IEEE Transactions on Speech and Audio Processing", 6(3), pp 282-287, 1998)) enables sharper resolution in time or frequency. If the window is small (4ms) the time resolution is increased and enables the detection of glottal closure instants. If the window is longer (32 ms for instance) very fine harmonics are obtained. The first image is a narrow band spectrogram of a female voice, the second is the corresponding reassigned spectrogram.

1  

 

2  

What see automatic speech recognition systems?

The "Critical bands" option in the spectrogram menu and the corresponding option spectral slice allows you to get a better idea of what spectral data automatic speech recognition systems use. The next image shows the spectrogram of a female voice processed by mel filters displayed in the mel frequency scale. Note that the DCT has not been performed. This explains why first harmonics remains. On the contrary, there is a strong smoothing in high frequencies and their contribution is weaker than with a linear frequency scale.

1  

The fllowing images shows the spectral slice which has been obtained with mel cepstral smoothing. Unlike standard mel cepstral analysis an IDCT-III transform has been applied to mel cepstra to recover a spectrum. Two extra triangular filters (one at zeo Hz and one at Fs/2) have been added to keep the energy of the spectrum. This image correspond to standard parameters used in automatic speech recognition, i.e. 24 filters, 12 coefficients.


 
 

How to edit speech files

The editor of WinSnoori is not intended to make modifications on large portions of signal since the window is limited but to edit signal by taking into account the acoustical effects of the editing commands. We describe two situations :

Note: The editor of WinSnoori can handle files of any size. Nevertheless the duration of the window is limited to 4 seconds in order to minimize the memory required by WinSnoori. This means that if you want to perform editing commands on large portions of a file you may be obliged to iterate your command on small portions of the file to be modified.
 

Suppressing a discontinuity (spectral bar or click)

 

When you cut a region you run the risk of breaking the periodic structure of speech which gives rise to discontinuities. These discontinuities correspond to spectral bars in the spectrogram and "clicks" when your are listening to the signal. The following figure exhibits such a discontinuity. You can listen to this discontinuity  here.


 
 
 

You can suppress or at least reduce the influence of this discontinuity by restoring the periodic structure of speech. First, zoom in the signal in order to localize the discontinuity (at the center of the green circle). Choose a reasonable zoom so that the periodic structure does appear.


 
 

Then cut a portion of signal so that an artificial periodic structure is restored. Recompute the spectrogram if need be.


 
 

 This figure shows the final signal.  Here  is the new signal without "click".


 
 
 
 
 
 

You can also create artificial bursts when you cut a signal portion at a voicing onset.  In this case you can remove this artificial burst be "damping" the signal. Use "damp left" when the burst appears at the voice onset or the "damp right" when the burst appears at the end of a voiced region.
 

Editing tools

 

The "Editing tools" window enables attenuations (left, right, middle), the addition of noise, filtering through FIR and OLA filters. You have to choose among these several possibilities. Here, the OLA filtering has been selected. As it is shown below its enables filtering along time x frequency trajectories. This window can be called from the "Edit" menu.

Filtering a time x frequency region

Suppose you want to investigate the importance of some cue (burst or formant trajectory for instance). One solution could be to remove this cue from signal by filtering.
Here is an example on  the file c:\Program Files\Wsno\Examples\English\404.wav. Suppose we want to remove F2 information in the sound "ah" of language. Call "FIR Filtering" from the Editing tools window (edit menu), select "Stop band" instead of "Pass band", set the filter order to 61 and the attenuation to 35 dB. Then, define the region to be filtered by dragging the left button and click the "Apply" button. As filtering has may create a small discontinuity you probably have to remove it (it is the case with this example).
Here are the original and the final spectrograms.
 

Original (sound

 and spectrogram)


 
 
 

Result (sound and spectrogram)
The region where the filtering has carried out is shown by a red ellipse.


 

Lowering or raising formants and harmonics by OLA filtering

Suppose you want to lower, remove, raise or keep only formants or harmonics to modify speech. One solution is to use the the OLA filtering from the Editing tools (Edit menu).
Here, we present one example with formants and another with hamonics.

 Editing formant amplitudes

The file is c:\Program Files\Wsno\Examples\English\307.wav and modifications are applied to the beginning of the file. The first step is to draw formants and to lock them on LPC roots. The locking algorithm is very simple. It search for a linear prediction root close to the frequency of the current point of the filter trajectory. If there is no root the bandwith of which is smaller than 700 Hz and closer than 500 Hz the trajectory points are not corrected. This explains jumps when energy is weak or when there are conflicting linear prediction roots. Once locked on linear prediction roots it can be useful to correct some points. For that purpose Ctrl+click on trajectory to be corrected, then move any point by keeping Ctrl+left button of the mouse pressed.

This figure shows the trajectories drawn for the first three formants F1, F2 and F3 after they have been locked on linear prediction roots. The width has been set to 800 Hz and the gain to -48 dB. These parameters are adapted to the filtering of formants.This is the original signal.

The second step is to apply the filters corresponding to these trajectories.
 

Here is the result when the "Pass" option has been chosen. Only the contributions of formants has been kept. This is the corresponding signal.The "Pass" option means that the gain factor applies everywhere but on the width corresponding to the filter trajectories. In case of conflict betwwen gains, the highest gain in modulus is accepted.


 

Here is the result when the "Stop" option has been selected. The contributions of formants F1, F2 and F3 have been removed. This is the corresponding signal.Here the gain factors were -48 dB. The effect of filtering can also be evaluated by displaying a narrow band spectrogram. 
Note that if you want to enhance formants you just have to set the gain to a positive value. Nevertheless you have to check that the gain is not to high otherwise the modifies signal won't be correct.
A 6dB gain corresponds to multiplying the signal by 2.

 Editing harmonic amplitudes

Harmonics can be edited in the same manner as that presented above. The only difference concerns the locking of trajectories on harmonics rather than linear prediction roots. As drawing filter trajectories superimposed onto harmonics is difficult, the highest frequency of the spectrogram can be changed ("Upper disp. frequency" in the "Options menu" of "Spectro"). When the highest frequency of the spectrogram is lowered it is worth changing the order of the Fourier transform to obtain a better frequency resolution.
 

This figure shows the three filter trajectories drawn on the narrow band spectrogram for the first second of the file c:\Program Files\Wsno\Examples\English\307.wav. Trajectories have been locked on harmonics and slightly corrected by hand.  Here is the original signal.


 

This figure shows the result of the filtering.  Here is the signal after harmonics 1, 3 and 5 have been removed.

How to use wav files with chunks of annotations

Using Wav files is probably the easiest way of organizing speech signals and annotations. When a Wav file is open WinSnoori saves phonetic and orthographic annotations in "chunks" (see Multimedia documentation for further details). The chunk which contains phonetic information is called "phon", and that containing orthographic information is called "word".
In each chunk the annotations are organized as follows :
left boundary in samples  and text (phoneme or words).
The next left boundary is the right boundary for the current annotation.
The program  chunk.cpp  in C:\Program Files\Wsno\Chunk shows how chunks can be read. You can type "chunk.exe <wavfile>" to display chunks created by WinSnoori.

Note that you can save annotations in an other file with the option "save as" in the "Phonemes" or "Words" menu, even with a wav file.

Automatic exploration of annotated corpora

Searching for a phoneme sequence

 

The automatic exploration of annotated corpora allows you to extract all the occurrences of one (or several)   sequence of phonemes out of the files. You can specify the sequences of phonemes (up to 5 sequences). Phonemes and phonetic classes are accessed by clicking buttons.


 
 

The corpus and the annotation directory are set by clicking the "Modify domain" button. This dialogue window allows the audio files as well as the annotation files to be specified. Actually, the dialogue has a more general purpose which consists in specifying the format of files read by WinSnoori. In this case only "speech file" and "phoneme file" is used. The file names must be understood as examples and not as the only files which are explored. The file names are used by WinSnoori to derive the directory and the extensions. When using wav files containing both the speech signal and annotations you have to specify the wav file as "speech file" and as "phoneme file" (in this particular case the "Format" is useless).


 
 
 

Once phoneme sequences and corpus are set, clicking the "ok" button calls the exploration which produces a speech file with phonetic annotations and a text file showing all the occurrences found. The speech file  is automatically opened (its path is "c:\Program Files\Wsno\temp\srchphon.wa$"). This speech file is   here  and the text file showing all the occurrences (its path is "c:\Program Files\Wsno\temp\results.txt") is  here .


 
 
 

Searching for the occurrences of a word

 

As for phonetic exploration, you specify the word to be searched for, "another" for instance.


 
 

Then you indicate the directory where to search and the type of files. In this example speech files are the same as annotation files since files are Wav files containing chunks for annotations.
Once words and corpus are set, clicking the "ok" button calls the exploration which produces a speech file with orthographic annotations and a text file showing all the occurrences found. The speech file  is automatically opened (its path is "c:\Program Files\Wsno\temp\srchword.wa$"). This speech file is  
here  and the text file showing all the occurrences (its path is "c:\Program Files\Wsno\temp\results.txt") is  here .


 
 
 

Graphical interface for the Klatt synthesizer

The Klatt synthesizer is an invaluable tool for studying acoustics and perception of speech. The graphical Klatt synthesizer interface is the last item of the "Formant" menu.

This interface provides you with a complete toolbox for editing files of parameters. The synthesizer is independent of WinSnoori and therefore can be replaced by any formant synthesizer provided that the parameters are the same. The current synthesizer is the GPL version developed by Jon Iles and Nick Ing-Simmons (The complete sources of the synthesizer are in Wsno\Klatt.). Note that we added some parameters: a time stamp at the beginning of each frame, extra parameters for the LF source, and extra formants to represent peaks that cannot be caught with F1 to F6.

You can draw parameter trajectories by using the left button while maintaining the key <Shift> pressed. Once a trajectory is created you can put it close to the nearest spectrogram peaks by "Registering" it with linear prediction or cepstral smoothing. This means that each point of the trajectory is moved towards the nearest linear prediction root (resp. the nearest peak of the cepstrally smoothed spectrum) provided that the root (resp. the peak) is acceptably close to the initial point.
This allows you to draw rough trajectories and then to correct them according to the real spectrogram.  You can also smooth trajectories with a B-spline algorithm. Note that registering is not meaningful in regions where the energy is very weak.
Trajectories are saved in text form. Each line gives parameters for the formant synthesizer (see c:\\Program Files\Wsno\Klatt\klatt.html for a complete description of the synthesizer and parameters.). The old format is still available for compatibility sake.

Suppose you want to copy the word "acoustic" in c:\Program Files\Wsno\Examples\English\404.wav. Here is a simple example of what you can do:

  1. First, compute LPC roots on this word and use "Keep decoration" in the Spectro menu to attach the display of LPC roots to the spectrogram image. This first step is necessay if you don't want to use the automatic formant tracking.
  2. Enter the graphical interface (opening this window takes a few seconds because F0 is computed for the whole file).
  3. Here you have the choice of getting formants by automatic formant tracking or by drawing them by hand.
    1. Automatic formant tracking: launch the automatic formant tracking (select the region you want to process and use "Copy synthesis/Formant tracking/Tracking from scratch"). Automatic formant tracking only tracks F1 to F4. At this point you should obtain this result.
      You have to draw F5 and F6 by hand if you want to add them (see next item).
    2. By hand: draw formants trajectories F1, F2, F3, F4 everywhere, and F5, F6 for fricative segments. You are not obliged to draw very accurate trajectories since you can automatically put the trajectories close to the LPC tracks. At this point you should obtain such a result.
      The synthesized signal looks awful, at least because the prosody has not been incorporated and trajectories are too rough. The next step consists in registering trajectories. For that purpose select trajectories to be registered, (CTRL+left click on these trajectories) and use "Register with cepstral smoothing" in the "Copy synthesis" menu. Trajectories can be edited in order to remove chaotic points in low energy regions or to correct trajectories where two formants are close together or don't correspond to any spectral peak.
  4. Then get the F0 information (Copy synthesis/Get F0) and go back to the frequency scale (Scale/Frequency) to see formant trajectories.
  5. Then adjust the formant amplitudes by using "Copy synthesis" from the "Copy synthesis" menu. Amplitudes are adjusted only for formant trajectories selected. If you want to adjust all the trajectories use "Select/All" before the copy synthesis. This function adjust only amplitudes of formants for the parallel branch. Note that the resulting curves are not smoothed. Smooth them to eliminate small jumps. Note that the first harmonics are modified by using the nasal formant as proposed by J. N. Holmes (Speech Communication, Vol 2. pp 251-273, 1983). With this simple scenario you should obtain these formant trajectories (Klatt file) and this synthesized waveform.

You can now investigate acoustic consequences of various transformations on formant parameters (removing a formant, changing amplitudes...). Here is a longer example copy for the French story "La bise et le soleil..." and the synthesized waveform. Note that these two examples were not corrected by hand after copy synthesis. It is more reasonable to edit by hand amplitude to generate smoother amplitude trajectories.

Modifying the speech rate and the fundamental frequency contour

You can modify the speech rate and the F0 contour of speech signals. The menu "Time scale modification" allows you to modify the speech rate and the F0 level for the whole sentence. Parameters greater than one decrease the speech rate and/or the F0 level. With the following choice :

and 
this  original sentence the speech is sped up and F0 is lowered (result).
Here is another example for a female voice with the same modification parameters :   original  and  modified. Note that the original signal is slightly noisy. More interestingly for oral comprehension speech can be slowed down :  original  and  slowed down  (the F0 level is kept unchanged and the time factor is set to 2).

You can also modify the F0 contour with the "F0 contour modifications" function. First you have to import the F0 contour by reading it from a file or by calculating it for the current signal. Note that even if only a part of the signal is displayed in the window the whole signal is processed. The following figure exhibits the original F0 contour superimposed onto the spectrogram for the sentence: "Another experiment required subjects to read lists of monosyllables aloud".

 

Second, you have to display F0 "to modify it". You can draw a new F0 contour by using the left button while maintaining the key <Shift> pressed. The new F0 contour kept the voiced-unvoiced nature of speech. If you move in the signal you have to "display FO to modify it" again so that the F0 contour becomes visible. The following figure exhibits the new F0 contour.


 

Finally, you can re-synthesize the sentence with this new contour. Here  is the original sentence, here  is the re-synthesized signal. Note that we also changed the F0 contour at the end of the sentence !
 

Correcting F0 values by hand

When F0 values calculated automatically will be used by another program it can be useful to correct them to avoid any problem. You can use the "F0 contour modifications" functions for that purpose by following exactly the same strategy as that described just above (import the F0 contour from a file, display the contour to modify it, and finally save this contour). You can substantially simplify this task by choosing an upper display frequency ("Options" in the "Spectro" menu) that corresponds to a multiple of the F0 scale.
For "Female" and "Any" speaker the highest F0 value is 600 Hz and 500 Hz for "Male" speakers. Therefore, by choosing 1800 Hz as the upper display frequency for female speakers, you just have to follow the third harmonic (1800/600). In the same way by choosing 2000 Hz as the upper frequency for male speakers, you just have to follow the forth (2000/500) harmonic.
Do not forget that you can also modify the voicing nature of points by moving them individually with the left button of the mouse while maintaining the key CTRL pressed.

Data journaling

It can be useful to exploit results of WinSnoori with other software. Most of the results of WinSnoori, therefore,  are put in text form and can be added to the log file (some results which could produce huge files of little interest, spectrograms for instance, are not saved). The functions "show results", "append results to log" and "show log" in the "Miscellaneous" allow respectively to display the text file for the very last command, to add this file to the log file and to display the log file.
Example:
When you drag the mouse to calculate and display a spectral slice all the spectra are saved in text form (use "show results" to see these values). This file contains all the spectra; it can be therefore more convenient to display only one spectrum by turning off the colored buttons of the spectra to be not displayed.
Then you can analyze results in another software.