PROJET: Projection-based separation of spatial audio

published in the IEEE Transactions on Audio, Speech and Language Processing

by D. Fitzgerald, A. Liutkus and R. Badeau.

Abstract

We propose a method to unmix multichannel audio signals into their different constitutive spatial objects. To achieve this, we characterize an audio object through both a spatial and a spectro-temporal modelling. The particularity of the spatial model we pick is that it neither assumes an object has only one underlying source point, nor does it attempt to model the complex room acoustics. Instead, it focuses on a listener perspective, and takes each object as the superposition of many contributions with different incoming directions and inter-channel delays. Our spectro-temporal probabilistic model is based on the recently proposed α-harmonisable processes, which are adequate for signals with large dynamics, such as audio. Then, the main originality of this work is to provide a new way to estimate and exploit inter-channel dependences of an object for the purpose of demixing. In the Gaussian alpha = 2 case, previous research focused on covariance structures. This approach is no longer valid for α < 2 where covariances are not defined. Instead, we show how simple linear combinations of the mixture channels can be used to learn the model parameters, and the method we propose consists in pooling the estimates based on many projections to correctly account for the original multichannel audio. Intuitively, each such downmix of the mixture provides a new perspective where some objects are cancelled or enhanced. Finally, we also explain how to recover the different spatial audio objects when all parameters have been computed. Performance of the method is illustrated on the separation of stereophonic music signals.

Full text

The full text of this paper is available here.

Referencing PROJET

The bibtex entry for the TASLP paper (including delays) is:

@article{fitzgerald:hal-01260588,

TITLE = {{Projection-based demixing of spatial audio}},

AUTHOR = {D. Fitzgerald and A. Liutkus and R. Badeau},

JOURNAL = {{IEEE Transactions on Audio, Speech and Language Processing}},

PUBLISHER = {{Institute of Electrical and Electronics Engineers}},

YEAR = {2016},

MONTH = May,

}

The bibtex entry for the original ICASSP paper is:

@inproceedings{fitzgerald:hal-01248014,

TITLE = {{PROJET - Spatial Audio Separation Using Projections}},

AUTHOR = {D. Fitzgerald and A. Liutkus and R. Badeau},

BOOKTITLE = {{41st International Conference on Acoustics, Speech and Signal Processing (ICASSP)}},

ADDRESS = {Shanghai, China},

PUBLISHER = {{IEEE}},

YEAR = {2016},

}

Implementations of PROJET

download a Matlab implementation for PROJET (original) here. Its license is BSD

download a Matlab implementation for PROJET (with delays) here.Its license is BSD

download a Python implementation for PROJET (original & with delays)here. Its license is BSD

Contact

derry (dot) fitzgerald (at) cit (dot) ie

antoine (dot) liutkus (at) inria (dot) fr

Examples of separation on popular excerpts

Multitrack HTML player by binarymind. Freely download and use it yourself here