The Adventures of Robi the Robot 2019

In 2019, I continued to use Robi the Robot to demonstrate to young audiences some of the science behind robot audition and artificial intelligence. Robi is essentially a Microsoft Kinect mounted on a Turtlebot. It looks like this:

With the help of team LARSEN, I developed on it a sound source localisation demo using the 4 microphones of the Kinect, making Robi spin towards sounds in its environment. After presenting it at the Serbia Science Festival 2018, I took it to different classes and events in 2019:

  • 5 different classes from 5th grade (CM2) to 12th grade (Terminal) in different primary schools, collèges and lycée of the Nancy area in January. This emerged from a partnership with the theatre La Manufacture around the theatre play Robots.
  • The Nuit des Chercheurs of Belgrade (Serbia).
  • 4 different school classes in different cities of Serbia (Novi-Sad, Zrenjanin, Belgrade).
  • The Fête de la Sciences 2019 of Nancy (October 11th and 12th).
  • The special edition of Nuit des Chercheurs in Nancy celebrating the 80th anniversary of CNRS.

Here are some pictures of my adventures with Robi:

Signal Processing Cup 2019

I had the pleasure and honour to initiate and coordinate the IEEE Signal Processing Cup 2019 on the theme « Search & Rescue with Drone-Embedded Sound Source Localization ». The SPCup is an international competition aiming at promoting real world applications of signal processing amongst undergraduate students . It took place from November the 14th 2018 to May the 13th 2019.

The three finalist teams of the SPCup 2019 at ICASSP, Brighton, UK.

The goal was for participating teams to build a system capable of localizing a sound source based on audio recordings made with a microphone array embedded in an unmanned aerial vehicle (UAV). The first phase of the competition was open to teams of undergraduate students and ended on March the 11th 2019, while the final took place at ICASSP (Birghton, UK) on May the 13th 2019 between the three finalist teams. The data used in both phases was based on the DREGON dataset which we recently released.

The data of the competition, including the ground truth, evaluation scripts and baseline, are now publicly available on the DREGON website.

In addition, the website now hosts UAV-embedded recordings of drone egonoise that the participants sent. Annotated noise-only recordings from 11 different drones using microphone arrays of 1 to 16 channels are freely available there. Go check them out!

Serbia Science Festival 2018

From November 29th to December 1st, I had the great joy and honor to introduce a young audience to some of the science behind robots, artificial intelligence and sounds at the 12th edition of Serbia Science Festival (Festival Nauke). I gave four lectures each in front of 450 people, most of them pupils between 5 and 15 years old. The slides of the presentation can be found here.

The DREGON dataset

Martin Strauss, Pol Mordel, Victor Miguet and myself just released the DREGON dataset. DREGON stands for DRone EGonoise and localizatiON. It consists in sounds recorded with an 8-channel microphone array embedded into a quadrotor UAV (Unmanned Aerial Vehicle). The recordings are annotated with the precise 3D position of the sound source relative to the drone as well as additional internal characteristics of the drone state such as motor speed and intertial measurements. It aims at promoting research in UAV-embedded sound source localization, notably for the application of semi-autonomous search-and-rescue with drones.

The VAST project

VAST stands for virtual acoustic space traveling and is a new paradigm for learning-based sound source localization and audio scene geometry estimation. Most existing methods that estimate the position of a sound source or other audio geometrical properties are either based on an approximate physical model (physics-driven) or on a specific-purpose calibration set (data-driven). With VAST, the idea is to learn a mapping from audio features to desired geometrical properties using a massive dataset of simulated room impulse responses. The dataset is designed to be maximally representative of the potential audio scenes the considered system may be evolving in while remaining reasonably compact. The aim is to demonstrate the good generalizability of mappings learned on a virtual datasets  to real-world data and to provide a useful tool for research teams interested in sound source localization.


Clément Gaultier, Saurabh Kataria, Diego Di Carlo and myself are working on the release of datasets for VAST. Two binaural datasets are already available on the project website. We co-authored two publications demonstrating this paradigm for binaural 3D sound source localization and wall absorption estimations using these datasets.