Corpus DinG

DinG — Dialogues in Games is a corpus of manual transcriptions of real-life, oral, spontaneous multi-party dialogues between French-speaking players of Catan (Copyright ©2017 CATAN Studio, Inc. and CATAN GmbH, all rights reserved). Catan is a board game for three to four players in which the main goal for each participant is to make their settlement prosper and grow, using resources that are scarce. Bargaining over these resources is a major part of the gameplay and constitutes the core of DinG‘s data. The corpus has been designed partly to showcase the SLAM corpus.

Dialogues from DinG are unconstrained, as the players don’t have to follow any rule or specific guideline, apart from playing the game. As bargaining over the resources is part of the gameplay, the players have to speak in order to play, so the dialogues are the ones naturally occurring in this particular setting. As the players have to speak to play, they do not discuss personal subjects outside the game setting, which makes it possible to completely anonymize the corpus by removing the players’ names (de-identification).

The corpus is available on Gitlab: https://gitlab.inria.fr/semagramme-public-projects/resources/ding/. It is distributed under the Attribution ShareAlike Creative Commons license (CC BY-SA 4.0).

Publications:

A Multi-Party Dialogue Ressource in French, Maria Boritchev, Maxime Amblard, to appear in The Thirteenth International Conference on Language Resources and Evaluation (LREC 2022), June 2022, Marseille, France.

DinG — a corpus of transcriptions of real-life, oral, spontaneous multi-party dialogues between French-speaking players of Catan, Maria Boritchev, Maxime Amblard, Journées LIFT 2021 – Linguistique informatique, formelle et de terrain, GDR LIFT, Dec 2021, Grenoble, France. HAL PDF link. HAL BIBtex link.