Speech technologies for mental health: from corpora to use in real-life situations
1/ Corpora
Corpora design and collection
I have designed two corpora (unfortunately not available for sharing due to legal restrictions) :
- The MSLTc: the Multiple Sleep Latency Test corpus has been recorded at the Sleep Department of the Bordeaux University Hospital (France). It contains the recording of 135 patients with complaints of hypersomnolence undergoing an MSLT (gold standard reference test in sleep medicine). Before each iteration of the MSLT, the patients are recorded reading out loud a text. More details are available in previous publications [Martin et al. 2020, Martin et al. 2021]
- The SOMVOICE corpus: this corpus has been recorded with healthy (screened) participants at the SANPSY lab. Similarly, each participant read out loud a text before the iterations of an MSLT. They undergo this procedure two times: after a normal night (control) and after a night of total sleep deprivation (randomized order). The analysis of this corpus in ongoing.
Critical analysis of corpora labels regarding mental health
Since all the speech-based machine learning models are trained on datasets, the way they are collected and labeled plays a strong role in the generalization of such algorithms.
- In collaboration with J.-L. Rouas (Univ. Bordeaux), I have recently written an article about the limitations of annotating speech corpora with diagnosis. Instead, I promote the labeling of speech corpora using symptoms.
- In collaboration with Pr. J.-A. Micoulaud-Franchi (Univ. Bordeaux), we have written an article about the importance of semiology in clinical reasoning in psychiatry and proposed a beautiful visualization (available here).
In the same vein, I am pursuing my work about psychiatric and sleep semiology [e.g. Martin et al. 2023, Martin et al. 2024] in collaboration with Pr. J.-A. Micoulaud-Franchi (project NUITs, funded by the SFRMS, in collaboration with HP2 — Grenoble, France).
2/ Speech analysis
Unlike the current trend towards end-to-end models for speech (or even fundational models) or the use of abstract representations (e.g. wav-2-vec) to detect health dimensions, my epistemological approach is based on explicability by design (cf. [Rudin 2019]). While limiting, this strong constraint makes possible to link the extracted features with speech production mechanisms, and bridge the gap between observed speech behaviors and their underlying mechanisms. Otherwise said, instead of bare unexplainable features extracted from speech signals, I rather focus on speech measures of mental phenomena [Liss 2024].
As such, I have studied the following speech measures :
- Acoustic measures (e.g. [Martin 2019])
- Reading mistakes [Martin 2020] and their automation using Automatic Speech Recognition errors [Martin 2021]
- Reading pauses : their number, duration but also their location in the read text (‘naturalness’) [Martin 2022]
- Phonemic characteristics, e.g. the number of schwas (optional central vowel of French) [Beaumard 2024]
- I am currently investigating vocalic triangles
3/ Use in real-life situations
Finally, I’m interested in the implementation of digital devices in real clinical environments.
To this end, I am conducting two parallel research activities:
- a theoretical approach, centered on the sociology of science and technology, anthropology, and ethnographic fieldwork, to observe and analyze “science in the making” [Latour 1981].
- an experimental approach, for which I work on modeling the acceptability of health devices using networks (in psychiatry in collaboration with Dr. S. Mouchabac (APHP), in sleep medicine with S. Bailly (Univ. Grenoble)).
An interdisciplinary view is also developed in the ENNACs project, involving epistemology, sociology, anthropology, theology, …