Acoustic-to-articulatory inversion
Abstract
Acoustic-to-articulatory inversion consists in recovering articulatory
data, or the geometry of the vocal tract, from the audio recording of
the speaker. The inverse problem may be formulated thus way: let
$\mathbf{s}$ an acoustic vector containing the acoustic features
observed in the acoustic speech signal (e.g. the formant frequencies),
$\mathbf{p}$, the articulatory vector to be recovered, containing
parameters that defines the geometry of the vocal tract (e.g. the area function), and
$\mathcal{L}$ an operator that gives the acoustic vector as a function
of the articulatory vector, hence
$$\mathbf{s}=\mathcal{L}(\mathbf{p}).$$
Then, acoustic-to-articulatory inversion consists in recovering
$\mathbf{p}$ from the observation of $\mathbf{s}$.
In [1], I proposed a method to quickly estimate the area function and
length function of the vocal tract of a speaker from the knowledge of
the formant frequency embedded in the original speech signal uttered by
the speaker. The method is an iterative method based on the sensitivity
functions of the vocal tract [2] and weighted penalty terms for better
regularization.
A few examples
simulation (/i/)
On real speaker: French vowels (/i/, /e/, /a/, /u/)
You can download the Matlab code here
for
acoustic-to-articulatory
inversion of oral vowels.
In this archive, you will find a file test.m, which contains 2
examples. You may choose the inversion of a /a/ by choosing "load
Library/a" and a /i/ by choosing "load Library/i".
In the Library folder, the constantterms.m file contains the constant
terms used to compute the transfer function corresponding to the
current area and length functions, using the chain matrix paradigm by
Sondhi and Schroeter [2]. The default file contains the parameters
defined by the authors. Feel free to change them. Updates are coming
soon.
Please, do not hesitate to report any suggestion, dysfunctionnement, or
weird result, to benjamin.elie(at)inria.fr.
[1] Elie B.,
and Laprie Y. "Audiovisual to area and length functions inversion of
human vocal tract". EUSIPCO, Lisbon 2014.
[2] M. M. Sondhi and J. Schroeter, "A hybrid time-frequency domain
articulatory speech synthesizer", IEEE Trans. Acoust. Speech Sig.
Process. 35(7), 955-967 (1987)
Last modification: June 23, 2016