Graphical interface for the Klatt synthesizer

This interface allows users to display and edit Klatt parameters. Most of this interface consists of editing facilities and tools to produce a synthetic copy of original speech. The Klatt synthesizer itself is not incorporated into WinSnoori but generates speech signals from files of Klatt parameters edited in WinSnoori. The Klatt synthesizer is that written by Jon Iles and and Nick Ing-Simmons (see here for a detailed description of this free Klatt synthesizer). This synthesizer has been complemented by the three parameters (Ra, Rk and Rg) of the LF source and three extra formants (F1N, F2N, F3N) which can be used to either represent nasal formants or represent extra spectral peaks, those of burst transients for instance. Note that the default nasal pole is used as the glottal formant in the automatic copy synthesis procedure.

Two modes for editing Klatt parameters have to be distinguished:

Besides editing facilities and more specific tools, users can examine the synthesized signal by zooming on or playing some signal regions, and more importantly by displaying spectral slices by moving the pointer in either the main window or the interface window. This last possibility enables the comparison of original and synthetic spectra and can guide users to improve Klatt parameters.

The overall task of users consists of drawing parameters trajectories with the mouse (Shift + left button pressed). Users can edit trajectories by clicking (Ctrl + left button) on a trajectory to be corrected, and then by moving points (Ctrl + left button pressed). In order to avoid discontinuities which can lead to synthesis “accidents” users should merge (“merge” in the Edit menu of the Klatt window) neighbouring trajectories corresponding to the same parameter.

Parameters are organized according their dimension and values:

This organization is intended to reduce the number of trajectories displayed simultaneously on the spectrogram. “Scale” menu in the Klatt window allows the dimension to be selected. In addition, the user can choose which parameters need to be displayed with the “Edit/Tool window” (see below in the “Edit Menu” section). This prevents several curves to be displayed simultaneously and giving a confusing image.

For the same reason parameters set to a default value are not displayed. This is the case for most of the numeral parameters unless the user modifies them explicitly. Default values of these parameters can be set by selecting “Setup/Default parameters” in the menu.

The default scale is that of formant frequencies (from 0 to half the sampling frequency). This offers a “naive” use of the Klatt synthesizer since only formant frequencies are visible. However, switching to another scale allows users to edit all the Klatt parameters. In order to reduce the number of trajectories displayed, only those different from default values are displayed in the form of trajectories. “Default parameters” in the “Setup” menu of the Klatt window allows these values to be modified.

There are two cases for editing formant parameters:

Note that the Klatt synthesizer is completely independent from WinSnoori. That means that no “semantic” control is performed before synthesizing a waveform. Therefore users are highly recommended to read carefully the help of the Klatt synthesizer and the papers of Dennis Klatt about his synthesizer (C:\Program Files\Wno\Klatt\klatt.exe):

The current synthesizer is the GPL version developed by Jon Iles and Nick Ing-Simmons (You can download the windows version at http://www.loria.fr/equipes/parole/Html/klatt.html.).

When it happens that parameters are not correct, synthesis fails, stops and thus generates only a partial waveform, possibly empty if the failure happens at the beginning. A floating point exception can happen if the Klatt parameters lead to huge values of synthetic samples. In these cases the spectrogram is partially or completely white because the signal does not exist.

The menu commands

File Menu

Open: reads and displays a file of Klatt parameters previously saved with the “save as” command. “.kla” files are for Klatt parameters without the three additional LF parameters. “.klf” files are for Klatt parameters including the control of the LF source. Synthesis parameters are time stamped. This means that they are attached to the original signal. Their location can be changed when they are read from a parameter file. For this purpose check the “Relocation offset” radio button of the bottom dialog (see image below) and give the offset to be used for reading. This offset can be negative or positive.

Save as: saves parameters in a specified file. As three parameters have been added to control the LF source this functions save now on 44 parameters. The file extension is .klf. The “Save as (Klatt with 40 param)” enables only 41 parameters to be saved, i.e. without the three additional parameters for the LF source.

Paste from file: overwrite Klatt parameters with those of the parameter file. The destination time can specified through the “Relocation offset” in order to compose new files with existing ones. Parameters are overwritten only if existing and read parameters are defined for the same time region.

Synthesize: synthesizes the signal corresponding to the current parameters. The command line arguments (see C:\Program Files\Wsno\Klatt\klatt.htm) can be set with “Settings” in the “Setup” menu.

Save synthesize speech: saves the speech synthesized into a wav file. Only wav files are allowed.

Save formants: saves formant trajectories only (to ensure the compatibility with earlier versions of WinSnoori).

Open formants: reads formant trajectories (saved with “Save formants” or with the formant editor of earlier versions of WinSnoori).

Save as (Klatt with 40 param): saves parameters in a specified file. Save the 44 parameters (with LF source).

Play: plays the region selected in the interface window of the whole synthesized signal if no region is selected.

Zoom in: Zooms on the selected.

Zoom out: Suppresses the zoom.

Select selects or de-selects one or more parameter trajectories.

F1 to F6 selects F1 (all the trajectories of F1), F{2,3,4,5,6,N} respectively

All selects all the parameter trajectories.

None de-selects all the parameter trajectories.

Copy synthesis Menu

Formant tracking: tracks formant frequencies on the highlighted region. If no region is selected the entire current speech window (the signal displayed) is processed.

The algorithm implemented is that described in (Y. Laprie. – “A concurrent curve strategy for formant tracking”. – In : Interspeech 2004 – International Conference on Spoken Language Processing, Jeju, South Korea. – oct 2004.). Note that the fourth formant F4 is used more to add a constraint on the higher frequency that F3 is allowed to reach rather than to provide a reliable information for F4. The tracking algorithm uses a spectrogram called “Support image” to deform initial curves. The default spectrogram is obtained by linear “Cepstral smoothing”. Slightly better results can be obtained with a “True envelope” (S. Imai and Y. Abe, “Spectral envelope extraction by improved cepstral method”, Trans. IECE, Vol. J62-A(4), pp. 217-223, 1979). A linear prediction spectrogram (LPC) is although possible even if results are not as good. The support image can be chosen by checking one of these three spectrogram calculations and saved with the option “Save image (PGM)”.

 

The principle of the tracking is to deform initial rough estimates of formant tracks through active curves called snakes. Snake parameters can be chosen with “Snake parameters”. This dialog window allows users to modify the “Number of iterations”, the “Discretization step” (a step of 3 means that one point out of 3 is kept in the snake calculations),   the “Spectrogram weight”, i.e. the respective weight of the spectrogram compared with the internal energy of snakes, “Alpha”, i.e. the weight of the first derivative of the curve, “Beta”, i.e. the weight of the first derivative of the curve, “Gamma”, i.e. the inertia of the curve (a small value, less than one, enables fast evolutions), “Repulsion”; i.e. the repulsion between two formant curves. In addition, intermediate results can be displayed when the “Display intermediate results” is checked.

 

Initial curves” can be derived from “LPC roots” by default or from “True envelope peaks”. The advantage of LPC roots is to be less sensitive to spurious peaks but with the risk of committing errors in case of nasal sounds with formants abnormally low in energy. Conversely, the “true envelope” algorithm presents the advantage of being closer to the harmonics and of detecting small peaks. It presents the disadvantage of merging formants close together.

 

Tracking” can be started from scratch (“From scratch”). In this case both the determination of initial estimates together with the deformation step are performed. “From existing formants” means that the deformation step will be applied on exiting formants on the highlighted region. “From existing selected formants” means that the deformation step will be applied only on existing selected formants and on the highlighted region. Other formants remain unchanged.

Register: registers the selected formant trajectories. Registration replaces every point by the spectral peak closest to it (LPC root or the peak of the cepstrally smoothed spectrum). The registered formants are sampled at the rate specified in the “Step” menu (see below). The registration can be performed onto the roots of linear prediction coding (with LPC roots) or peaks of cepstrally smoothed spectra (with cepstral smoothing).

Get F0: constructs an F0 contour by extracting F0 from the original speech signal. The trajectory covers the time domain formant trajectories are defined on. That means that no F0 trajectory is created if there is no formant trajectory already created.

Copy synthesis: adjust amplitudes of formants (FNP, F1 to F6) to approximate the initial signal. Formants trajectories and F0 must be specified. The natural way to copy a signal is thus to draw formants, then register them (Register) onto spectral maxima, then get the F0 contour (Get F0) and finally adjust the amplitudes (Copy synthesis). This procedure produces rough amplitude trajectories that can be edited by hand to smooth then and remove small jumps. One of the weak points of the Klatt synthesizer is that the source cannot be adjusted interpedently for each formant. The frication source (AF) has thus to be adjusted approximately (not too weak to enable noisy formants in voiced fricatives for instance, but not too strong to keep F2 and or F3 voiced). The trajectory proposed is a compromise and should be edited to get the expected result.

The copy strategy keeps constant bandwidths for formants. The first harmonics below F1 are controlled by using the nasal formant. This enables a slightly better auditory quality.

Scale Menu

The Klatt synthesizer uses 43 parameters which cannot be displayed simultaneously onto the spectrogram. Furthermore these parameters share neither the same dimension (Hz, dB, numeral) nor the same value range (F0 is between 50 and 800 Hz while formant frequencies are between 300 and 8000 Hz). Parameters are therefore organized according to their dimension and range. In addition to this organization, the “tool window” in menu “tools” allows users to activate some of these parameters to prevent overcrowded diagrams, especially for formant amplitudes.

Frequencyformant frequency domain: 0 – half the sampling frequency. Concerns formants from F1 to F6, FNZ (nasal zero) and FLP (nasal pole).

Bandwidth (cascade) formant bandwidth domain for the cascade branch: 0 – 700 Hz. Concerns formants F1 to F6 and FNZ.

Bandwidth (parallel) formant bandwidth domain for the parallel branch: 0 – 700 Hz. Concerns formants F1 to F6 and FNP.

Amplitude (cascade) amplitude domain for F0, Aspiration, Aturb, TiltdB, Friction, Bypass and Gain (see Klatt help for further details): -20dB to 100 dB.

Amplitude (parallel) amplitude domain for formants from F0 to F6 and FNP: -20dB to 100dB.

Frequency (F0) F0 frequency domain: 0 – 800 Hz.

Numeral domain for Kopen, Skewness, Rax1000, Rkx100 and Rgx50: 0 – 100

Edit Menu

Merge merges trajectories selected. Merging two close trajectories prevents the synthesizer from producing clicks and signal discontinuities.

Smooth smoothes the selected trajectories. Smoothing (by Bspline regularization) removes major trajectory irregularities. The smoothed trajectories are sampled at the pace specified in the “Step” menu (see below).

Delete destroys the selected trajectories.

Remove first point removes the first point of selected trajectories

Remove last point removes the last point of selected trajectories

Displays in F1-F2 plane displays F1 and F2 trajectories of the selected region in the F1-F2 plane.

Tool window this window allows users to activate or de-activate parameters and to move trajectories. This is intended to simplify the display and consequently to improve the interaction with users. Each parameter has three states:

  1. The parameter can be edited (the coloured square is filled). The trajectory is displayed in the form of a coloured bold line.
  2. The parameter is displayed but cannot be edited (one solid line is displayed in the square). The trajectory is displayed in the form of a coloured thin line.
  3. The parameter is hidden (a white square with a coloured border). The trajectory is hidden.

Parameter is activated or de-activated by clicking the text button (three clicks give the original state).

Besides activation of parameters, this window allows users to move upwards or downwards trajectories. Once a trajectory is selected (Ctrl + left button) a trajectory can be moved by a “large” increment (double up or down arrow) or a “small” increment (simple up or down arrow).

Step gives the sampling period for parameter trajectories (4, 8 or 16 ms).

Setup Menu

Spectral analysis parameters allows users to choose spectral analysis parameters used for registration of formant trajectories.

Default parameters allows users to change the default value of Klatt parameters. NOTE that default trajectories are not displayed until users draw trajectories with values different from the default values.

Synthesizer command options allows users to set options used when the Klatt synthesizer command is used (see C:\Program Files\Wsno\Klatt\klatt.htm for further details). The source can be set to that of Klatt80 or to LF.

Display mode Menu

Squares trajectories are displayed in the form of squares not connected with each other.

Lines trajectories are displayed in the form of lines.

?:

mouse help: displays the appropriate items in the WinSnoori on line help.