{"id":581,"date":"2020-11-11T13:07:49","date_gmt":"2020-11-11T11:07:49","guid":{"rendered":"http:\/\/members.loria.fr\/SOuni\/?page_id=581"},"modified":"2020-11-11T15:46:35","modified_gmt":"2020-11-11T13:46:35","slug":"stage-1-m2-deep-tongue","status":"publish","type":"page","link":"https:\/\/members.loria.fr\/SOuni\/stage-1-m2-deep-tongue\/","title":{"rendered":"Stage 1 &#8211; M2 &#8211; Deep Tongue"},"content":{"rendered":"<h2><span style=\"color: #ff6600\"><strong>Mod\u00e9lisation dynamique d\u2019une langue 3D d\u2019un avatar par des r\u00e9seaux profonds\u00a0<\/strong><\/span><\/h2>\n<ul>\n<li><strong>Laboratoire\u00a0<\/strong>: Inria Nancy Grand Est &#8211; LORIA<\/li>\n<li>\u00a0<strong>Ville\u00a0<\/strong>: Nancy, France.<\/li>\n<li><strong>\u00c9quipe\u00a0<\/strong>: Multispeech<\/li>\n<li><b>Th\u00e9matique :\u00a0<\/b>Intelligence artificielle \/ Interaction \/ Traitement de la parole multimodale<\/li>\n<li><strong>Contact<\/strong> : Slim Ouni \u00a0<strong>(Slim.Ouni@loria.fr<\/strong>)<\/li>\n<\/ul>\n<p><em><span style=\"color: #ff0000\">(For English Version, see below)<\/span><\/em><\/p>\n<h3><strong>Pr\u00e9sentation \u00a0<\/strong><\/h3>\n<p>La langue joue un r\u00f4le important dans la production de la parole. Elle participe \u00e0 l\u2019articulation de plusieurs sons et sa position est critique pour certains phon\u00e8mes. L\u2019\u00e9tude des gestes articulatoires permet de mieux comprendre les m\u00e9canismes\u00a0de\u00a0production de la parole avec des implications directes sur l&rsquo;apprentissage\u00a0des langues et la\u00a0r\u00e9\u00e9ducation orthophonique.<\/p>\n<p>Dans le cadre de nos travaux sur la t\u00eate parlante 3D (un avatar parlant), nous avons d\u00e9velopp\u00e9 un syst\u00e8me d\u2019animation d\u2019un avatar \u00e0 partir de la parole permettant d\u2019animer finement la bouche. Nous souhaitons augmenter ce syst\u00e8me par un mod\u00e8le de langue 3D qui accro\u00eet consid\u00e9rablement l\u2019intelligibilit\u00e9 globale de l\u2019articulation visuelle.<\/p>\n<p>La langue est un organe complexe, tr\u00e8s flexible, extensible et compressible qui peut \u00eatre courb\u00e9 et qui permet de r\u00e9aliser des degr\u00e9s d\u2019articulation tr\u00e8s fine. Les aspects dynamiques de l\u2019articulation de la langue (y compris la coarticulation, c\u2019est-\u00e0-dire l\u2019interaction entre les phon\u00e8mes leurs influences mutuelles) sont \u00e9galement importants. Plusieurs approches de mod\u00e9lisation de langue existent. Elles sont soit purement g\u00e9om\u00e9triques, soit bas\u00e9es sur des images IRM et des donn\u00e9es \u00e9lectromagn\u00e9tographiques (EMA). En effet, il est possible d\u2019observer la d\u00e9formation de la langue et de mesurer son \u00e9volution temporelle en utilisant ces techniques, qui sont d\u2019ailleurs utilis\u00e9es dans plusieurs \u00e9tudes en production de la parole pour acqu\u00e9rir un corpus de donn\u00e9es 3D de la langue.<\/p>\n<h3><strong>Objectifs du stage<\/strong><\/h3>\n<p>L\u2019objectif de ce travail est de coordonner les mouvements de la langue avec le signal de parole.\u00a0 Il s\u2019agit donc de contr\u00f4ler le mouvement d\u2019une langue 3D \u00e0 partir de la parole. Le mod\u00e8le 3D de la langue doit permettre d\u2019avoir\u00a0un compromis entre une structure tr\u00e8s flexible qui permet de r\u00e9aliser des gestes complexes et une repr\u00e9sentation simple contr\u00f4l\u00e9e par un petit nombre de param\u00e8tres. Il s\u2019agit de partir d\u2019un mod\u00e8le 3D g\u00e9n\u00e9rique de langue qui sera contr\u00f4l\u00e9 par les donn\u00e9es 2D ou 3D acquises par un articulographe ou IRM. Un corpus de donn\u00e9es articulatoires est disponible et sera utilis\u00e9 dans cette \u00e9tude pour entrainer un syst\u00e8me de r\u00e9seau de neurones utilisant des techniques d\u2019apprentissage profond pour estimer les mouvements de la langue \u00e0 partir de la parole (Biasutto\u2013Lervat, et Ouni, 2018). Le syst\u00e8me de contr\u00f4le de la langue sera \u00e9valu\u00e9 et int\u00e9gr\u00e9 dans une t\u00eate parlante anim\u00e9e.<\/p>\n<p>N\u2019h\u00e9sitez pas \u00e0 contacter le responsable du stage pour tout compl\u00e9ment d\u2019information.<\/p>\n<h3><strong>Comp\u00e9tences esp\u00e9r\u00e9es<\/strong><\/h3>\n<p>De bonnes connaissances informatiques et en <em>machine learning<\/em> sont n\u00e9cessaires. Une premi\u00e8re exp\u00e9rience avec l\u2019utilisation d\u2019une librairie de r\u00e9seaux de neurones (comme PyTorch ou TensorFlow, \u2026) est appr\u00e9ci\u00e9e.<\/p>\n<h3><span style=\"color: #0000ff\">Bourse d&rsquo;excellence<\/span><\/h3>\n<p>Le laboratoire propose un nombre limit\u00e9 de bourses d&rsquo;excellence pour des candidats excellents (un bon parcours acad\u00e9mique) qui sont des \u00e9tudiants\u00a0 fran\u00e7ais (hors de la r\u00e9gion Grand-Est) ou \u00e9trangers, qui souhaitent faire une th\u00e8se par la suite. Cette bourse couvre la mobilit\u00e9 \u00e0 une hauteur de <strong>1000\u20ac<\/strong> et une indemnit\u00e9 de <strong>1000\u20ac<\/strong> par mois. Pour candidater \u00e0 ce financement, il faut r\u00e9pondre \u00e0 cette offre de stage en m&rsquo;envoyant votre CV avant le <strong>26\/11\/2020<\/strong><\/p>\n<h3><strong>Bibliographie<\/strong><\/h3>\n<ul>\n<li>T. Biasutto\u2013Lervat, and S. Ouni. \u00ab\u00a0Phoneme-to-Articulatory mapping using bidirectional gated RNN.\u00a0\u00bb Interspeech 2018.<\/li>\n<li>Y. Jun, C. Jiang, R. Li, C.W. Luo, &amp; Z.F. Wang (2016). Real-Time 3-D Facial Animation: From Appearance to Internal Articulators. IEEE Transactions on Circuits and Systems for Video Technology, 28(4), 920-932.<\/li>\n<li>Li, R., &amp; Yu, J. (2017, October). An audio-visual 3D virtual articulation system for visual speech synthesis. In 2017 IEEE International Symposium on Haptic, Audio and Visual Environments and Games (HAVE) (pp. 1-6). IEEE.<\/li>\n<li>Bian, J., Li, S., Wang, Y., Chen, J., &amp; Xiao, H. (2017, December). A survey of tongue modeling methods in speech visualization. In 2017 2nd International Conference on Robotics and Automation Engineering (ICRAE) (pp. 431-435). IEEE.<\/li>\n<li>O. Engwall, (2003). \u201cCombining MRI, EMA &amp; EPG measurements in a three-dimensional tongue model,\u201d Speech Communication, vol. 41, no. 2-3, pp. 303\u2013329.<\/li>\n<li>W. Fernandez, P. Mithraratne, S. F. Thrupp, M. H. Tawhai, and P. J. Hunter (2004) \u00ab\u00a0Anatomically based geometric modelling of the musculo-skeletal system and other organs,\u00a0\u00bb Biomechanics and Modeling in Mechanobiology, vol. 2, no. 3, pp. 139-155, 2004.<\/li>\n<li>S.A. King and R.E. Parent, \u00ab\u00a0A 3D Parametric Tongue Model for Animated Speech,\u00a0\u00bb JVCA, vol. 12, no. 3, pp. 107-115, 2001.<\/li>\n<li>X.B. Lu, C.W. Thorpe, K. Foster and P. Hunter, (2009) \u00ab\u00a0From experiments to articulatory motion: a three-dimensional talking head model\u00a0\u00bb Interspeech 2009, Brighton.<\/li>\n<li>Articulographe AG501, Carstens, http:\/\/www.articulograph.de<strong><br \/>\n<\/strong><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2><span style=\"color: #ff6600\"><strong>Dynamic modeling of an avatar&rsquo;s 3D tongue by deep networks<\/strong><\/span><\/h2>\n<h3><strong>Presentation<\/strong><\/h3>\n<p>Tongue plays an important role in speech production. It participates in the articulation of several sounds and its position is critical for several phonemes. The study of articulatory gestures makes it possible to better understand the mechanisms of speech production with direct implications on language learning and speech therapy.<\/p>\n<p>In our work on a 3D talking head (a talking avatar), the increase of the latter by a 3D tongue model considerably increases the overall intelligibility of the visual articulation.<\/p>\n<p>As part of our work on 3D talking head (a speaking avatar), we have developed a system for animating an avatar from speech to finely animate the mouth. We are considering augmenting this system with a 3D language model that considerably increases the overall intelligibility of the visual articulation.<\/p>\n<p>The tongue is a complex, highly flexible, extensible and compressible organ that can be curved and allows very fine degrees of articulation to be achieved. The dynamic aspects of language articulation (including coarticulation) are also important. Several approaches to language modeling exist. They are either purely geometric or based on MRI images and electromagnetic data (EMA). Indeed, it is possible to observe the deformation of the tongue and to measure its temporal evolution using these techniques, that are also used in several speech production studies to acquire a 3D data set of the language.<\/p>\n<h3><strong>\u00a0<\/strong><strong>Objective of the Internship<\/strong><\/h3>\n<p>The objective of this work is to coordinate the movements of the tongue with the speech signal.\u00a0 Therefore, we need to control the movement of a 3D tongue from speech. The 3D model of the tongue must allow a compromise between a very flexible structure that allows complex gestures to be made and a simple 3D representation controlled by a small number of parameters. A generic 3D tongue model that will be controlled by 3D data acquired by an articulography or MRI. A corpus of articulatory data is available and will be used in this study to train a neural network system using deep learning techniques to estimate tongue movements from speech. The tongue control system will be evaluated and integrated into an animated talking head.<\/p>\n<p>Feel free to contact the internship supervisor for any further information.<\/p>\n<h3><strong>\u00a0<\/strong><strong>Skills and profile<\/strong><\/h3>\n<p>Appropriate candidates would have strong background in computer science and machine learning. A first experience with the use of a neural network library (such as PyTorch or TensorFlow,&#8230;) is appreciated.<\/p>\n<h3><strong>Context<\/strong><\/h3>\n<p>The work will be done within a dynamic research team Multispeech research team), at the research center Inria Nancy Grand Est (LORIA) and you will integrate a team composed of both experienced and young researchers (PhD, postdocs and engineers) and closely supervised by a senior researcher. We have a motion capture facilities and an articulograph in the laboratory that can be used to acquire data in this project. \u00a0Several speech processing tools are available in the team.<\/p>\n<p>This internship can be a great opportunity to discover research in the field of spoken communication and 3D avatar animation using automatic learning techniques.<\/p>\n<h3><span style=\"color: #0000ff\"><strong>Excellence Internship<\/strong> <\/span><em>(Bourse d\u2019excellence)<\/em><\/h3>\n<p>It is possible to apply for a highly competitive internship funding for outstanding candidates (with\u00a0good academic background) who are French students (outside the Grand-Est region) or foreign students, who are interested \u00a0to pursue doctoral thesis in the lab. This funding will cover mobility expenses (limited to <strong>1000\u20ac<\/strong> ) and \u00a0<strong>1000\u20ac<\/strong> per month. To apply for this funding, you need contact me and send me your CV before <strong>26\/11\/2020<\/strong>.<\/p>\n<h3><strong>Bibliography<\/strong><\/h3>\n<ul>\n<li>T. Biasutto\u2013Lervat, and S. Ouni. \u00ab\u00a0Phoneme-to-Articulatory mapping using bidirectional gated RNN.\u00a0\u00bb Interspeech 2018.<\/li>\n<li>Y. Jun, C. Jiang, R. Li, C.W. Luo, &amp; Z.F. Wang (2016). Real-Time 3-D Facial Animation: From Appearance to Internal Articulators. IEEE Transactions on Circuits and Systems for Video Technology, 28(4), 920-932.<\/li>\n<li>Li, R., &amp; Yu, J. (2017, October). An audio-visual 3D virtual articulation system for visual speech synthesis. In 2017 IEEE International Symposium on Haptic, Audio and Visual Environments and Games (HAVE) (pp. 1-6). IEEE.<\/li>\n<li>Bian, J., Li, S., Wang, Y., Chen, J., &amp; Xiao, H. (2017, December). A survey of tongue modeling methods in speech visualization. In 2017 2nd International Conference on Robotics and Automation Engineering (ICRAE) (pp. 431-435). IEEE.<\/li>\n<li>O. Engwall, (2003). \u201cCombining MRI, EMA &amp; EPG measurements in a three-dimensional tongue model,\u201d Speech Communication, vol. 41, no. 2-3, pp. 303\u2013329.<\/li>\n<li>W. Fernandez, P. Mithraratne, S. F. Thrupp, M. H. Tawhai, and P. J. Hunter (2004) \u00ab\u00a0Anatomically based geometric modelling of the musculo-skeletal system and other organs,\u00a0\u00bb Biomechanics and Modeling in Mechanobiology, vol. 2, no. 3, pp. 139-155, 2004.<\/li>\n<li>S.A. King and R.E. Parent, \u00ab\u00a0A 3D Parametric Tongue Model for Animated Speech,\u00a0\u00bb JVCA, vol. 12, no. 3, pp. 107-115, 2001.<\/li>\n<li>X.B. Lu, C.W. Thorpe, K. Foster and P. Hunter, (2009) \u00ab\u00a0From experiments to articulatory motion: a three-dimensional talking head model\u00a0\u00bb Interspeech 2009, Brighton.<\/li>\n<li>Articulographe AG501, Carstens, http:\/\/www.articulograph.de<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Mod\u00e9lisation dynamique d\u2019une langue 3D d\u2019un avatar par des r\u00e9seaux profonds\u00a0<\/p>\n<ul>\n<li>Laboratoire\u00a0: Inria Nancy Grand Est &#8211; LORIA<\/li>\n<li>\u00a0Ville\u00a0: Nancy, France.<\/li>\n<li>\u00c9quipe\u00a0: Multispeech<\/li>\n<li>Th\u00e9matique :\u00a0Intelligence artificielle \/ Interaction \/ Traitement de la parole multimodale<\/li>\n<li>Contact : Slim Ouni \u00a0(Slim.Ouni@loria.fr)<\/li>\n<\/ul>\n<p><em>(For English Version, see below)<\/em><\/p>\n<p>Pr\u00e9sentation \u00a0<\/p>\n<p>La langue joue un r\u00f4le important dans la production de la parole. Elle participe \u00e0 l\u2019articulation de plusieurs sons et sa position est critique pour certains phon\u00e8mes. L\u2019\u00e9tude des gestes articulatoires permet de mieux comprendre les m\u00e9canismes\u00a0de\u00a0production de la parole avec des implications directes sur l&rsquo;apprentissage\u00a0des langues et la\u00a0r\u00e9\u00e9ducation orthophonique.<\/p>\n","protected":false},"author":116,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-581","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/members.loria.fr\/SOuni\/wp-json\/wp\/v2\/pages\/581","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/members.loria.fr\/SOuni\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/members.loria.fr\/SOuni\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/members.loria.fr\/SOuni\/wp-json\/wp\/v2\/users\/116"}],"replies":[{"embeddable":true,"href":"https:\/\/members.loria.fr\/SOuni\/wp-json\/wp\/v2\/comments?post=581"}],"version-history":[{"count":5,"href":"https:\/\/members.loria.fr\/SOuni\/wp-json\/wp\/v2\/pages\/581\/revisions"}],"predecessor-version":[{"id":586,"href":"https:\/\/members.loria.fr\/SOuni\/wp-json\/wp\/v2\/pages\/581\/revisions\/586"}],"wp:attachment":[{"href":"https:\/\/members.loria.fr\/SOuni\/wp-json\/wp\/v2\/media?parent=581"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}