A brief description of WinSnoori
Yves Laprie
November 4, 2002
This page is not intended to present WinSnoori exhaustively but to demonstrate
some of the facilities of WinSnoori which make its interest, or at least
which can been exploited to solve some concrete problems. We will therefore
present:
-
how to change the initialization file to
adapt WinSnoori to your system configuration
-
how to use WinSnoori in an html page
-
the way of adapting the set of phonemes
to another language, and of organizing them so it is possible to explore
annotated corpora efficiently,
-
how to measure time and frequency
on the spectrogram,
-
how to edit speech files,
-
the advanced editing tools that can be used
to prepare perception stimuli,
-
how to lower or raise formant and harmonic
amplitudes with OLA filtering,
-
how to use wav files with chunks of annotations,
-
the automatic exploration
of annotated corpora (phonetic exploration
or orthographic exploration)
-
how the Klatt synthesizer is interfaced,
-
how to modify speech rate and the fundamental frequency
contour,
-
how to correct the fundamental frequency
contour easily,
-
data journaling.
Changing the initialization
file
The directoy where WinSnoori is installed is the
value of the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Wsno.
The initialization file Wsno.ini in the
Windows directory allows you to adapt WinSnoori regarding messages, phonetic
symbols and font and audio player.
-
SndRecorder is the player by WinSnoori to
play back signals synthesized by prosody editing tools. SndRecorder=C:\Winnt\system32\sndrec32.exe
is the default value. SndRecorder must be set according to the windows
system (Windows NT, Windows 95 or Windows 98).
-
CountryFile contains the messages displayed
by WinSnoori. CountryFile=english.str is the default value which
corresponds to the file c:\Program Files\Wsno\Format\english.str. You can
change messages by editing this file or by replacing it with another file.
-
LanguageFile is the file describing the set
of phonemes. See Adapting the
set of phonemes for further details.
-
PhoneticFont is the font used to display phonetic
symbols. This font should be used according to the set of phonetic symbols
(see Changing the phonetic font).
Using WinSnoori in an Html page
You can launch WinSnoori from a HTML page by adding the .wsn extension
to any valid sound file.
Here is an example.
More generally you can run WinSnoori with the following command :
wsno <file>
Adapting the set
of phonemes
Phonetic annotations belong to the set of phonetic symbols used by WinSnoori.
It can be useful to choose another set of symbols, possibly not phonetic,
to annotate some speech files. With this aim in view the user is given
the possibility to change the set of phonetic symbols as well as the font.
Changing the set of symbols
There are already several sets of symbols in "c:\Program Files\Wsno\format"
:
-
english.pho (Timit symbols for english)
-
englipa.pho (IPA symbols for english and for the IPAPhon font)
-
francais.pho (SAM symbols for french)
-
franipa.pho (IPA symbols for french and for the IPAPhon font)
-
frankiel.pho (IPA symbols for french and the IPAKiel font)
These files contain the list of symbols, the hierarchical phonetic
menu, the phonetic classes, the hierarchical menu for classes (this menu
is used during exploration of annotated corpora) and the first two formant
frequencies for vowels (the frequency are used to plot formants in F1-F2
plane). You can create a new file following the pattern indicated in these
files. For incorporating symbols not belonging to the default font use
"charmap" with the font you want to use ("IPAPhon" font, for instance),
select symbols with "charmap" and paste them in the new file. Note
that these characters do not appear correctly in the new file if you are
using "notepad" to create the new phonetic file. Anyway, WinSnoori will
display correct symbols. Note that the way you organize phonetic
symbols into phonetic classes influence the automatic exploration of annotated
corpora. You have to pay attention to define classes which take into account
the manner and the point of articulation of sounds.
Once this file is created, modify the file "c:\Winnt\wsno.ini" (or in
"windows" or "windows95") accordingly. Assuming that the new file is xxxipa.phon
replace the old line "LanguageFile=old.pho" by "LanguageFile=xxxipa.pho".
Do not forget to modify the font so that symbols appear correctly in WinSnoori.
Changing the phonetic font
The first step is to check that the font is installed. If it is not the
case (especially for IPAPhon or IPAKiel font) install the font on your
PC. Then, you can modify the "wsno.ini" file by specifying the new phonetic
font. Assuming that the new font is "IPAPhon" replace the old line "PhoneticFont=Arial"
by "PhoneticFont=IPAPhon".
How to
measure time and frequency on the spectrogram
Time and frequency coordinates pointed by the mouse
are indicated in the welcome window. Once the F0 has been calculated F0
is also displayed. Besides the coordinates the duration of the highlighted
region is given in ms and in s-1. That information can be used
to evaluate F0 by hand, for instance.
 |
Suppose you have zoomed on the region displayed in this spectrogram
and you want to evaluate F0. You simply select a region which obviously
corresponds to the period of F0. When you point the mouse in this region
you obtain :
517 ms [11407] -> time coordinate where the mouse points
f0 = 253 Hz -> F0 extracted
automatically
4ms
-> duration of the highlighted region
1/251.3 sec -> Frequency
(s-1 or Hz) corresponding to that region.
The two values 253 Hz and 251.3 Hz are in good agreement. |
How to edit
speech files
The editor of WinSnoori is not intended to make modifications
on large portions of signal since the window is limited but to edit signal
by taking into account the acoustical effects of the editing commands.
We describe two situations :
-
you have created some artificial discontinuity in
the signal and you want to remove it,
-
you want to filter the signal to remove some acoustic
cue.
Note: The editor
of WinSnoori can handle files of any size. Nevertheless the duration of
the window is limited to 4 seconds in order to minimize the memory required
by WinSnoori. This means that if you want to perform editing commands on
large portions of a file you may be obliged to iterate your command on
small portions of the file to be modified.
Suppressing a discontinuity (spectral bar or click)
When you cut a region you run the risk of breaking
the periodic structure of speech which gives rise to discontinuities. These
discontinuities correspond to spectral bars in the spectrogram and "clicks"
when your are listening to the signal. The following figure exhibits such
a discontinuity. You can listen to this discontinuity here. |
 |
You can suppress or at least reduce the influence of this discontinuity
by restoring the periodic structure of speech. First, zoom in the signal
in order to localize the discontinuity (at the center of the green circle).
Choose a reasonable zoom so that the periodic structure does appear. |
 |
Then cut a portion of signal so that an artificial periodic structure
is restored. Recompute the spectrogram if need be. |
 |
This figure shows the final signal. Here
is the new signal without "click". |
 |
You can also create artificial bursts when you
cut a signal portion at a voicing onset. In this case you can remove
this artificial burst be "damping" the signal. Use "damp left" when
the burst appears at the voice onset or the "damp right" when the
burst appears at the end of a voiced region.
Editing tools
The "Editing tools" window enables attenuations (left, right, middle),
the addition of noise, filtering through FIR and OLA filters. You have
to choose among these several possibilities. Here, the OLA filtering has
been selected. As it is shown below its enables filtering along time x
frequency trajectories. This window can be called from the "Edit" menu. |
 |
Filtering a time x frequency region
Suppose you want to investigate the importance of
some cue (burst or formant trajectory for instance). One solution could
be to remove this cue from signal by filtering.
Here is an example on the file c:\Program
Files\Wsno\Examples\English\404.wav. Suppose we want to remove F2 information
in the sound "ah" of language. Call "FIR Filtering" from
the Editing tools window (edit menu), select "Stop band" instead
of "Pass band", set the filter order to 61 and the attenuation to 35 dB.
Then, define the region to be filtered by dragging the left button and
click the "Apply" button. As filtering has may create a small discontinuity
you probably have to remove it (it is the case with this example). Here
are the original and the final spectrograms.
Original (sound
and spectrogram) |
 |
Result (sound
and spectrogram)
The region where the filtering has carried
out is shown by a red ellipse. |
 |
Lowering or raising formants
and harmonics by OLA filtering
Suppose you want to lower, remove, raise or keep
only formants or harmonics to modify speech. One solution is to use the
the OLA filtering from the Editing tools (Edit menu).
Here, we present one example with formants and
another with hamonics.
Editing formant
amplitudes
The file is c:\Program Files\Wsno\Examples\English\307.wav
and modifications are applied to the beginning of the file. The first step
is to draw formants and to lock them on LPC roots. The locking algorithm
is very simple. It search for a linear prediction root close to the frequency
of the current point of the filter trajectory. If there is no root the
bandwith of which is smaller than 700 Hz and closer than 500 Hz the trajectory
points are not corrected. This explains jumps when energy is weak or when
there are conflicting linear prediction roots. Once locked on linear prediction
roots it can be useful to correct some points. For that purpose Ctrl+click
on trajectory to be corrected, then move any point by keeping Ctrl+left
button of the mouse pressed.
This figure shows the trajectories drawn for the first three formants
F1, F2 and F3 after they have been locked on linear prediction roots. The
width has been set to 800 Hz and the gain to -48 dB. These parameters are
adapted to the filtering of formants.This
is the original signal. |
 |
The second step is to apply the filters corresponding
to these trajectories.
Here is the result when the "Pass" option has been chosen. Only the
contributions of formants has been kept. This
is the corresponding signal.The "Pass" option means that the gain factor
applies everywhere but on the width corresponding to the filter trajectories.
In case of conflict betwwen gains, the highest gain in modulus is accepted. |
 |
Here is the result when the "Stop" option has been selected. The contributions
of formants F1, F2 and F3 have been removed. This
is the corresponding signal.Here the gain factors were -48 dB. The effect
of filtering can also be evaluated by displaying a narrow band spectrogram.
Note that if you want to enhance formants you just have to set the
gain to a positive value. Nevertheless you have to check that the gain
is not to high otherwise the modifies signal won't be correct. A 6dB gain
corresponds to multiplying the signal by 2. |
 |
Editing harmonic
amplitudes
Harmonics can be edited in the same manner as that
presented above. The only difference concerns the locking of trajectories
on harmonics rather than linear prediction roots. As drawing filter trajectories
superimposed onto harmonics is difficult, the highest frequency of the
spectrogram can be changed ("Upper disp. frequency" in the "Options menu"
of "Spectro"). When the highest frequency of the spectrogram is lowered
it is worth changing the order of the Fourier transform to obtain a better
frequency resolution.
This figure shows the three filter trajectories drawn on the narrow
band spectrogram for the first second of the file c:\Program
Files\Wsno\Examples\English\307.wav. Trajectories have been locked on harmonics
and slightly corrected by hand. Here
is the original signal. |
 |
This figure shows the result of the filtering. Here
is the signal after harmonics 1, 3 and 5 have been removed. |
 |
How to use wav files with chunks
of annotations
Using Wav files is probably the easiest way of organizing
speech signals and annotations. When a Wav file is open WinSnoori saves
phonetic and orthographic annotations in "chunks" (see Multimedia documentation
for further details). The chunk which contains phonetic information is
called "phon", and that containing orthographic information is called "word".
In each chunk the annotations are organized as
follows :
left boundary in samples and text (phoneme
or words).
The next left boundary is the right boundary
for the current annotation.
The program chunk.cpp
in C:\Program Files\Wsno\Chunk shows how chunks can be read. You can type
"chunk.exe <wavfile>" to display chunks created by WinSnoori.
Note that you can save annotations in an other
file with the option "save as" in the "Phonemes" or "Words" menu, even
with a wav file.
Automatic
exploration of annotated corpora
Searching for a phoneme
sequence
The automatic exploration of annotated
corpora allows you to extract all the occurrences of one (or several)
sequence of phonemes out of the files. You can specify the sequences of
phonemes (up to 5 sequences). Phonemes and phonetic classes are accessed
by clicking buttons. |
 |
The corpus and the annotation directory are set by clicking the "Modify
domain" button. This dialogue window allows the audio files as well as
the annotation files to be specified. Actually, the dialogue has a more
general purpose which consists in specifying the format of files read by
WinSnoori. In this case only "speech file" and "phoneme file" is used.
The
file names must be understood as examples and not as the only files which
are explored. The file names are used by WinSnoori to derive
the directory and the extensions. When using wav files containing both
the speech signal and annotations you have to specify the wav file as "speech
file" and as "phoneme file" (in this particular case the "Format" is useless). |
 |
Once phoneme sequences and corpus are set, clicking the "ok" button
calls the exploration which produces a speech file with phonetic annotations
and a text file showing all the occurrences found. The speech file
is automatically opened (its path is "c:\Program Files\Wsno\temp\srchphon.wa$").
This speech file is here
and the text file showing all the occurrences (its path is "c:\Program
Files\Wsno\temp\results.txt") is here
. |
 |
Searching for the occurrences
of a word
As for phonetic exploration, you specify the
word to be searched for, "another" for instance. |
 |
Then you indicate the directory where to search
and the type of files. In this example speech files are the same as annotation
files since files are Wav files containing chunks for annotations.
Once words and corpus are set, clicking the "ok" button calls the exploration
which produces a speech file with orthographic annotations and a text file
showing all the occurrences found. The speech file is automatically
opened (its path is "c:\Program Files\Wsno\temp\srchword.wa$"). This speech
file is here and the
text file showing all the occurrences (its path is "c:\Program Files\Wsno\temp\results.txt")
is here . |
 |
Graphical interface
for the Klatt synthesizer
The formant editor of WinSnoori 1.2 has been replaced
by a graphical interface of the Klatt synthesizer. The Klatt synthesizer
is an invaluable tool for studying acoustics and perception of speech.
The graphical Klatt synthesizer interface is the last item of the "Formant"
menu.
This interface provides you with a complete toolbox for editing files
of parameters. The synthesizer is independent of WinSnoori and therefore
can be replaced by any formant synthesizer provided that the parameters
are the same. The current synthesizer is the GPL version developed by Jon
Iles and Nick Ing-Simmons (You can download the windows version at http://www.loria.fr/equipes/parole/Html/klatt.html.).
You can draw parameter trajectories by using the left button while maintaining
the key <Shift> pressed. Once a trajectory is created you can put it
close to the nearest spectrogram peaks by "Registering" it with
linear prediction or cepstral smoothing. This means that each point of
the trajectory is moved towards the nearest linear prediction root (resp.
the nearest peak of the cepstrally smoothed spectrum) provided that the
root (resp. the peak) is acceptably close to the initial point.
This allows you to draw rough trajectories and then to correct them
according to the real spectrogram. You can also smooth trajectories
with a B-spline algorithm. Note that registering is not meaningful in regions
where the energy is very weak.
Trajectories are saved in text form. Each line gives parameters for
the formant synthesizer (see c:\\Program Files\Wsno\Klatt\klatt.html for
a complete description of the synthesizer and parameters.). The old format
is still available for compatibility sake.
Suppose you want to copy the word "acoustic"
in c:\Program Files\Wsno\Examples\English\404.wav.
Here
is a sample example of what you can do:
-
First, compute LPC roots on this word and use "Keep decoration" in the
Spectro menu to attach the display of LPC roots to the spectrogram image.
-
Enter the graphical interface (opening this window takes a few seconds
because F0 is computed for the whole file) and draw formants trajectories
F1, F2, F3, F4 everywhere, and F5, F6 for fricative segments. You are not
obliged to draw very accurate trajectories since you can automatically
put the trajectories close to the LPC tracks. At this point you should
obtain such a result.

-
At this point the synthesized signal looks awful, at least because the
prosody has not been incorporated and trajectories are too rough. The next
step consists in registering trajectories. For that purpose select trajectories
to be registered, (CTRL+left click on these trajectories) and use "Register
with LPC" in the "Copy synthesis" menu. Trajectories can be edited in order
to remove chaotic points in low energy regions and more importantly to
set bandwithts ("Bandwidth parallel" in the "Scale" menu) to constant values
because the values obtained after registration are those calculated with
LPC and are therefore dramatically chaotic. Set the bandwidths between
60Hz and 200 Hz according to the formant. Then add the F0 information
with "Get F0" in "Copy synthesis". The synthesized
signal is now slightly more correct but there is still no frication
noise !
-
Then add frication noise by drawing "frication trajectories" (select the
"Amplitude (Cascade)" scale) where there is friction noise, and adjust
formant amplitudes (select the "Amplitude (Parallel)"). With this simple
scenario you should obtain these formant trajectories (Klatt
file) and this synthesized waveform.

You can now investigate acoustic consequences of various transformations
on formant parameters (removing a formant, changing amplitudes...).
Modifying the speech rate
and the fundamental frequency contour
You can modify the speech rate and the F0 contour
of speech signals. The menu "Time scale modification" allows you
to modify the speech rate and the F0 level for the whole sentence. Parameters
greater than one decrease the speech rate and/or the F0 level. With the
following choice :
and this original
sentence the speech is sped up and F0 is lowered (result).
Here is another example for a female voice with
the same modification parameters : original
and modified.
Note that the original signal is slightly noisy. More interestingly for
oral comprehension speech can be slowed down : original
and slowed down
(the F0 level is kept unchanged and the time factor is set to 2).
You can also modify the F0 contour with the "F0 contour modifications"
function. First you have to import the F0 contour by reading it from a
file or by calculating it for the current signal. Note that even if only
a part of the signal is displayed in the window the whole signal is processed.
The following figure exhibits the original F0 contour superimposed onto
the spectrogram for the sentence: "Another experiment required subjects
to read lists of monosyllables aloud".
Second, you have to display F0 "to modify it". You can draw a
new F0 contour by using the left button while maintaining the key <Shift>
pressed. The new F0 contour kept the voiced-unvoiced nature of speech.
If you move in the signal you have to "display FO to modify it" again so
that the F0 contour becomes visible. The following figure exhibits the
new F0 contour.
Finally, you can re-synthesize the sentence with this new contour. Here
is the original sentence, here
is the re-synthesized signal. Note that we also changed the F0 contour
at the end of the sentence !
Correcting F0 values by hand
When F0 values calculated automatically will be used by
another program it can be useful to correct them to avoid any problem. You can use the
"F0 contour modifications" functions for that purpose by following exactly the same strategy
as that described just above (import the F0 contour from a file, display the contour to
modify it, and finally save this contour). You can substantially simplify this
task by choosing an upper display frequency ("Options" in the "Spectro" menu) that
corresponds to a multiple of the F0 scale.
For "Female" and "Any" speaker the highest F0 value is 600 Hz and 500 Hz for
"Male" speakers. Therefore, by choosing 1800 Hz as the upper display frequency for female
speakers, you just have to follow the third harmonic (1800/600). In the same way by
choosing 2000 Hz as the upper frequency for male speakers, you just have to follow the forth
(2000/500) harmonic.
Do not forget that you can also modify the voicing nature of points by moving them
individually with the left button of the mouse while maintaining the key CTRL pressed.
Data journaling
It can be useful to exploit results of WinSnoori
with other software. Most of the results of WinSnoori, therefore,
are put in text form and can be added to the log file (some results which
could produce huge files of little interest, spectrograms for instance,
are not saved). The functions "show results", "append results
to log" and "show log" in the "Miscellaneous" allow respectively
to display the text file for the very last command, to add this file to
the log file and to display the log file.
Example:
When you drag the mouse to calculate and display
a spectral slice all the spectra are saved in text form (use "show
results" to see these values). This file contains all the spectra;
it can be therefore more convenient to display only one spectrum by turning
off the colored buttons of the spectra to be not displayed. Then you can
analyze results in another software.