We have set up an acquisition hardware platform to acquire multimodal data in speech communication context. The system is composed of the articulograph Carstens AG501 (which was acquired as part of the EQUIPEX ORTOLANG ), 4 Vicon cameras (a motion capture system), 8 Optitrack cameras (our new motion capture system funded by CPER LCHN) an Intel RealSense which is a depth camera (acquired as part of the project co-funded Inria-region project CORExp), a video camera and a microphone. With such heterogeneous hardware the synchronization is essential; this is achieved through a trigger device. We have used the system to acquire multimodal data for a collaborative project STIC-AmSud and to acquire a first exploratory expressive multimodal corpus.
Our work in audiovisual speech relies on acquiring data and processing it. This can be time-consuming and costly in terms of the effort required to carry out the acquisition and processing the data. This effort is unavoidable to make more progress in modeling the processes related to human communication.
We are continuously working on improving the acquisition techniques and investigating methods to make the process easier.