Graphical
interface for the Klatt synthesizer
This interface allows users
to display and edit Klatt parameters. Most of this
interface consists of editing facilities and tools to produce a synthetic copy
of original speech. The Klatt synthesizer
itself is not incorporated into WinSnoori but
generates speech signals from files of Klatt
parameters edited in WinSnoori. The Klatt synthesizer is that written by Jon Iles and and Nick Ing-Simmons (see here for a
detailed description of this free Klatt synthesizer).
This synthesizer has been complemented by the three parameters (Ra, Rk and Rg) of the LF source and
three extra formants (F1N, F2N, F3N) which can be used to either represent
nasal formants or represent extra spectral peaks, those of burst transients for
instance. Note that the default nasal pole is used as the glottal formant in
the automatic copy synthesis procedure.
Two modes for editing Klatt parameters have to be distinguished:
Besides editing facilities
and more specific tools, users can examine the synthesized signal by zooming on
or playing some signal regions, and more importantly by displaying spectral
slices by moving the pointer in either the main window or the interface window.
This last possibility enables the comparison of original and synthetic spectra
and can guide users to improve Klatt parameters.
The overall task of users
consists of drawing parameters trajectories with the mouse (Shift + left button
pressed). Users can edit trajectories by clicking (Ctrl + left button) on a
trajectory to be corrected, and then by moving points (Ctrl + left button
pressed). In order to avoid discontinuities which can lead to synthesis
“accidents” users should merge (“merge” in the Edit menu of the Klatt window) neighbouring trajectories corresponding to
the same parameter.
Parameters are organized
according their dimension and values:
This organization is intended
to reduce the number of trajectories displayed simultaneously on the
spectrogram. “Scale” menu in the Klatt window allows
the dimension to be selected. In addition, the user can choose which parameters need to be displayed with the “Edit/Tool window” (see below in the “Edit Menu” section). This prevents several curves to be displayed simultaneously and giving a confusing image.
For the same reason parameters set to a default value are not displayed. This is the case for most of the numeral parameters unless the user modifies them explicitly. Default values of these parameters can be set by selecting “Setup/Default parameters” in the menu.
The default scale is that
of formant frequencies (from 0 to half the sampling frequency). This offers a
“naive” use of the Klatt synthesizer since only
formant frequencies are visible. However, switching to another scale allows
users to edit all the Klatt parameters. In order to
reduce the number of trajectories displayed, only those different from default
values are displayed in the form of trajectories. “Default parameters” in the
“Setup” menu of the Klatt window allows these values
to be modified.
There are two cases for
editing formant parameters:
Note that the Klatt synthesizer is completely independent from WinSnoori. That means that no “semantic” control is
performed before synthesizing a waveform. Therefore users are highly
recommended to read carefully the help of the Klatt
synthesizer and the papers of Dennis Klatt about his
synthesizer (C:\Program Files\Wno\Klatt\klatt.exe):
The current synthesizer is
the GPL version developed by Jon Iles and Nick Ing-Simmons (You can download the windows version at http://www.loria.fr/equipes/parole/Html/klatt.html.).
When it happens that
parameters are not correct, synthesis fails, stops and
thus generates only a partial waveform, possibly empty if the failure happens
at the beginning. A floating point exception can happen if the Klatt parameters lead to huge values of synthetic samples.
In these cases the spectrogram is partially or completely white because the
signal does not exist.
The menu commands
File Menu
Open: reads and displays a file of Klatt parameters previously saved with the “save as”
command. “.kla” files are for Klatt
parameters without the three additional LF parameters. “.klf”
files are for Klatt parameters including the control
of the LF source. Synthesis parameters are time stamped. This means that they
are attached to the original signal. Their location can be changed when they
are read from a parameter file. For this purpose check the “Relocation offset”
radio button of the bottom dialog (see image below) and give the offset to be
used for reading. This offset can be negative or positive.
Save as: saves parameters in a specified
file. As three parameters have been added to control the LF source this
functions save now on 44 parameters. The file extension is .klf.
The “Save as (Klatt with 40 param)”
enables only 41 parameters to be saved, i.e. without the three additional
parameters for the LF source.
Paste from file: overwrite Klatt
parameters with those of the parameter file. The destination time can specified
through the “Relocation offset” in order to compose new files with existing
ones. Parameters are overwritten only if existing and read parameters are
defined for the same time region.
Synthesize: synthesizes the signal
corresponding to the current parameters. The command line arguments (see
C:\Program Files\Wsno\Klatt\klatt.htm) can be set with “Settings” in the
“Setup” menu.
Save synthesize speech: saves the speech synthesized into a wav file. Only wav files are
allowed.
Save formants: saves formant trajectories only (to
ensure the compatibility with earlier versions of WinSnoori).
Open formants: reads formant trajectories (saved
with “Save formants” or with the formant editor of earlier versions of WinSnoori).
Save as (Klatt with 40 param): saves parameters in a specified
file. Save the 44 parameters (with LF source).
Play: plays the region selected in the
interface window of the whole synthesized signal if no region is selected.
Zoom in: Zooms on the selected.
Zoom out: Suppresses the zoom.
Select selects or de-selects one or more
parameter trajectories.
F1 to F6 selects F1 (all the trajectories of
F1), F{2,3,4,5,6,N} respectively
All selects all the parameter trajectories.
None de-selects all the parameter
trajectories.
Copy synthesis Menu
Formant tracking: tracks
formant frequencies on the highlighted region. If no region is selected the
entire current speech window (the signal displayed) is processed.
The algorithm implemented is that described in (Y. Laprie. – “A concurrent curve strategy for formant
tracking”. – In : Interspeech
2004 – International Conference on Spoken Language Processing, Jeju, South Korea. – oct 2004.). Note that the fourth formant F4 is used
more to add a constraint on the higher frequency that F3 is allowed to reach
rather than to provide a reliable information for F4. The tracking algorithm
uses a spectrogram called “Support image”
to deform initial curves. The default spectrogram is obtained by linear “Cepstral smoothing”. Slightly better results
can be obtained with a “True envelope”
(S. Imai and Y. Abe, “Spectral envelope extraction by improved cepstral method”, Trans.
IECE, Vol. J62-A(4), pp. 217-223, 1979). A linear
prediction spectrogram (LPC) is
although possible even if results are not as good. The support image can be
chosen by checking one of these three spectrogram calculations and saved with
the option “Save image (PGM)”.
The principle of the tracking is to deform
initial rough estimates of formant tracks through active curves called snakes.
Snake parameters can be chosen with “Snake
parameters”. This dialog window allows users to modify the “Number of iterations”, the “Discretization step” (a step of 3 means that one
point out of 3 is kept in the snake calculations), the “Spectrogram
weight”, i.e. the respective weight of the spectrogram compared with the
internal energy of snakes, “Alpha”,
i.e. the weight of the first derivative of the curve, “Beta”, i.e. the weight of the first derivative of the curve, “Gamma”, i.e. the inertia of the curve
(a small value, less than one, enables fast evolutions), “Repulsion”; i.e. the repulsion between two formant curves. In
addition, intermediate results can be displayed when the “Display intermediate results” is checked.
“Initial
curves” can be derived from “LPC
roots” by default or from “True
envelope peaks”. The advantage of LPC roots is to be less sensitive to
spurious peaks but with the risk of committing errors in case of nasal sounds
with formants abnormally low in energy. Conversely, the “true envelope”
algorithm presents the advantage of being closer to the harmonics and of
detecting small peaks. It presents the disadvantage of merging formants close
together.
“Tracking” can be started from scratch
(“From scratch”). In this case both
the determination of initial estimates together with the deformation step are
performed. “From existing formants”
means that the deformation step will be applied on exiting formants on the
highlighted region. “From existing
selected formants” means that the deformation step will be applied only on
existing selected formants and on the highlighted region. Other formants remain
unchanged.
Register: registers the selected formant
trajectories. Registration replaces every point by the spectral peak closest to
it (LPC root or the peak of the cepstrally smoothed
spectrum). The registered formants are sampled at the rate specified in the
“Step” menu (see below). The registration can be performed onto the roots of
linear prediction coding (with LPC roots) or peaks of cepstrally smoothed spectra (with
cepstral smoothing).
Get F0: constructs an F0 contour by
extracting F0 from the original speech signal. The trajectory covers the time
domain formant trajectories are defined on. That means that no F0 trajectory is
created if there is no formant trajectory already created.
Copy synthesis: adjust amplitudes of formants (FNP, F1 to F6) to approximate the initial
signal. Formants trajectories and F0 must be specified. The natural way to copy
a signal is thus to draw formants, then register them (Register) onto spectral maxima, then get the F0 contour (Get F0) and finally adjust the
amplitudes (Copy synthesis). This
procedure produces rough amplitude trajectories that can be edited by hand to
smooth then and remove small jumps. One of the weak points of the Klatt synthesizer is that the source cannot be adjusted
interpedently for each formant. The frication source (AF) has thus to be
adjusted approximately (not too weak to enable noisy formants in voiced
fricatives for instance, but not too strong to keep F2 and or F3 voiced). The
trajectory proposed is a compromise and should be edited to get the expected
result.
The copy strategy keeps
constant bandwidths for formants. The first harmonics below F1 are controlled
by using the nasal formant. This enables a slightly better auditory quality.
Scale Menu
The Klatt
synthesizer uses 43 parameters which cannot be displayed simultaneously onto
the spectrogram. Furthermore these parameters share neither the same dimension
(Hz, dB, numeral) nor the same value range (F0 is between 50 and 800 Hz while
formant frequencies are between 300 and 8000 Hz). Parameters are therefore
organized according to their dimension and range. In addition to this
organization, the “tool window” in menu “tools” allows users to activate some
of these parameters to prevent overcrowded diagrams, especially for formant
amplitudes.
Frequencyformant frequency domain: 0 – half
the sampling frequency. Concerns formants from F1 to F6, FNZ (nasal zero) and
FLP (nasal pole).
Bandwidth (cascade) formant bandwidth domain for the
cascade branch: 0 – 700 Hz. Concerns formants F1 to F6 and FNZ.
Bandwidth (parallel) formant bandwidth domain for the
parallel branch: 0 – 700 Hz. Concerns formants F1 to F6 and FNP.
Amplitude (cascade) amplitude domain for F0, Aspiration,
Aturb, TiltdB, Friction,
Bypass and Gain (see Klatt help for further details):
-20dB to 100 dB.
Amplitude (parallel) amplitude domain for formants from
F0 to F6 and FNP: -20dB to 100dB.
Frequency (F0) F0 frequency domain: 0 – 800 Hz.
Numeral domain for Kopen,
Skewness, Rax1000, Rkx100 and Rgx50: 0 – 100
Edit Menu
Merge merges trajectories selected. Merging
two close trajectories prevents the synthesizer from producing clicks and
signal discontinuities.
Smooth smoothes the selected trajectories.
Smoothing (by Bspline regularization) removes major
trajectory irregularities. The smoothed trajectories are sampled at the pace
specified in the “Step” menu (see below).
Delete destroys the selected trajectories.
Remove first
point removes the first point of
selected trajectories
Remove last point removes the last point of
selected trajectories
Displays in F1-F2 plane displays F1 and F2 trajectories of
the selected region in the F1-F2 plane.
Tool window this window allows users to activate
or de-activate parameters and to move trajectories. This is intended to
simplify the display and consequently to improve the interaction with users.
Each parameter has three states:
Parameter is activated or
de-activated by clicking the text button (three clicks give the original
state).
Besides activation of
parameters, this window allows users to move upwards or downwards trajectories.
Once a trajectory is selected (Ctrl + left button) a trajectory can be moved by
a “large” increment (double up or down arrow) or a “small” increment (simple up
or down arrow).
Step gives the sampling period for
parameter trajectories (4, 8 or 16 ms).
Setup Menu
Spectral analysis
parameters allows
users to choose spectral analysis parameters used for registration of formant
trajectories.
Default parameters allows users to change the default
value of Klatt parameters. NOTE that default
trajectories are not displayed until users draw trajectories with values
different from the default values.
Synthesizer command
options allows
users to set options used when the Klatt synthesizer
command is used (see C:\Program Files\Wsno\Klatt\klatt.htm for further
details). The source can be set to that of Klatt80 or to LF.
Display mode Menu
Squares trajectories are displayed in the
form of squares not connected with each other.
Lines trajectories are displayed in the
form of lines.
?:
mouse help: displays the appropriate items in
the WinSnoori on line help.