Simulation of alveolar trill consonants using self-oscillating models of the tongue tip



Presentation

Alveolar trills, or "rolled-R", are characterized by a periodic modulation of the amplitude of the acoustic signal, as shown in Fig. 1. This amplitude modulation is due to the oscillation of the tongue tip in the alveolar region. In natural speech, the alveolar trills exhibit 2 or 3 periods [1], a bit more in the case of geminates. The frequency of the tongue tip oscillation is between 25 and 30 Hz [1]. They are commonly presented as hard-to-produce sounds [2], as they require very specific conditions at the vicinity of the tongue tip to initiate the self-oscillations. My work aims at proposing a mechanical model intended to be used in the context of articulatory synthesis, in order to investigate these specific conditions. For that purpose, I propose to model the tongue tip with a lumped two-mass system, similar to those classically used for the vocal folds (e.g. [3-4]), and to connect it into the Extended Single-Matrix Formulation of the vocal tract (ESMF) [5] that has been recently developed. It considers realistic time-varying geometries of the vocal tract, including the side cavities and the incomplete closure of the glottis.

ArtSynthSchema

Fig. 1 - Acoustic waveform of an alveolar trill between two vowels /a/, uttered by a native Spanish female speaker.



Acoustic modeling

Two mass model of the tongue tip

The model is presented in Fig. 2. The position of the upstream and downstream masses are computed at each time step by solving a matrix differential equation. The airflow in the oral channel is considered as incompressible, and the viscous effects are taken into account thanks to a Poiseuille corrective term.

ArtSynthSchema


Fig. 2 - A classical $2 \times 2$ mass model with smooth contours adapted to the tongue tip.

Incomplete closure of the vocal tract during tongue-palate contacts

Experimental data of natural alveolar trill realizations suggest that the air flow is not completely stopped during the closing phase of the tongue oscillation: the acoustic waveform is not null (see Fig. 1), as is the air flow measured at the mouth [1]. If linguopalatal occurs during the closing phase, it means that it does not lead to a complete occlusion of the vocal tract, and that the airflow probably goes through the edge of the tongue tip. I propose to model this lateralization of the airflow by adding a parallel acoustic branch connected both at the upstream and downstream parts of the tongue (see Fig. 3).

ArtSynthSchema


Fig. 3 - Modeling of the airflow laterlization


Some results

Static configuration

Fig. 4 shows an example of simulation of an alveolar trill in a static configuration. The area function is taken as the averaged area functions extracted from a cineMRI [6] acquisition of alveolar trills uttered by a 35 years-old male speaker. Npte that, thanks to the lateralization, there is still airflow $U_m$ at the mouth and acoustic pressure $P_{Out}$ during the closed phase of the lingual constriction height $h_t$.

ArtSynthSchema
ArtSynthSchema

Fig. 4 -Area function (Top), and some waveforms obtained from the simulation. $P_{Out}$ is the acoustic waveform radiated at the lips, $h_t$ is the height of the lingual constriction at the upstream (solid line) and the downstream (dashed line) sections. $U_m$ is the volme velocity of the airflow at the mouth, and $h_g$ is the glottal opening at the upstream (solid line) and downstream (dashed line) parts of the vocal folds.

Running speech synthesis

Below is an example of running speech synthesis of an alveolar trill in an intervocalic context. It is the pseudoword /ara/. Fig. 5 shows the spectrograms and audio signals of the original utterance, produced by a 35 years-old male speaker, and the copy of the utterance.
ArtSynthSchema
Fig. 5 - Wide-band spectrogram of the pseudoword /ara/ containing an alveolar trill. Top is the original utterance, and bottom is the simulated utterance. The $y$-axis represents the frequency in kHz.

Audio files :

Original Copy Both

You can find additional information in my papers

[J1] Elie B., and Laprie Y. "Simulating alveolar trills using a two-mass model of the tongue tip". J. Acoust. Soc. Am. 142(5), pp. 3245-3256 (2017). [.pdf]
[P1] Elie B., and Laprie Y. "Self-oscillating models of the tongue tip for simulating alveolar trills". 24th Intern. Congress on Vibration and Sounds Acoustics (ICSV), London 2017. [.pdf]

REFERENCES

[1] M.-J. Solé, “Aerodynamic characteristics of trills and phonological patterning,” J. of Phon., vol. 30, pp. 655–688 (2002).
[2] B. C. Jimenez, "Acquisition of Spanish consonants in children aged 3–5 years, 7 months," Language, Speech, and Hearing Services in Schools, 18(4), pp. 357–363 (1987).
[3] N. J. C. Lous, G. C. J. Hofmans, R. N. J. Veldhuis, A. Hirschberg, "A symetrical two-mass vocal-fold model coupled to vocal tract and trachea, with application to prothesis design," Acta Acustica, 84, pp. 1135–1150 (1998).
[4] X. Pelorson, A. Hirschberg, R. R. van Hassel, A. P. J. Wijnands, Y. Auregan, "Theoretical and experimental study of quasisteady-flow separation within the glottis during phonation. Application to a modified two-mass model," J. Acoust. Soc. Am. 96(6) (1994).
[5] Elie B., and Laprie Y. "Extension of the single-matrix formulation of the vocal tract: consideration of bilateral channels and connection of self-oscillating models of the vocal folds with a glottal chink". Speech Comm. 82, pp. 85-96 (2016). [.pdf] [.bib]
[6] Elie B., Laprie Y., Vuissoz P.-A., and Odille F. "High spatiotemporal cineMRI films using compressed sensing for acquiring articulatory data". EUSIPCO, Budapest 2016. [.pdf] [.bib]

Last modification: March, 1st, 2017