Determined and over-determined speech and music mixtures


We propose to repeat Lucas Parra's (external link) and Kenneth Hild's evaluations so as to gather results from more participants. All recordings have been downsampled to the same rate (16 kHz) and cut to the same duration (10 s), so as to ease handling of the datasets and comparison of the results with other datasets.

Results


See the results webpage (external link)

Test data


Download parra.zip (external link) (23 MB)
Download hild.zip (external link) (0.5 MB)
These files are licensed for research use only by their authors (see list of authors below).

The dataset "parra" contains 21 4-channel recordings of 2 to 4 speech sources in 7 different recording conditions. The data consist of 4-channel WAV audio files, that can be imported in Matlab using the wavread command. These files are named room<r>_<J>sources_mix.wav, where <r> is a character identifying the room and <J> is the number of sources. The authors of the dataset are Hiroshi Sawada (rooms 4 and 5), Mads Dyrholm (rooms 1, 2, 3, C and O) and Lucas Parra. The music used for recordings in rooms 1, 2 and 3 was taken from "Germ Germ" by Das Böse Ding and has kindly been approved for public presentation by Jan Klare of Das Böse Ding in the name of research.

Room 1: Chamber w. cushion walls. W1.5 W2 H2.5 meters.
Scenario: The sources were placed randomly in the room, either on the floor or on a table. The Microphones were placed randomly approx. 50cm from the walls at different heights.
Equipment: Behringer ECM8000 omnidirectional microphones. SM Pro Audio PR4V microphone preamp. Standard desktop mono PC speakers were used for sources. Audiotrak Maya44USB soundcard for 4 channel recording.

Room 2: Medium size conference room. W10 W8 H3 meters
Scenario: The sources were placed randomly in the room, either on the floor or on a table or stool. With an average distance of 2 meters between any two sources.
The Microphones were placed along the wall closest to the sources, approx. 1 meter from the wall, at different heights, uniformly spaced with approx. 1 meter.
Equipment: Behringer ECM8000 omnidirectional microphones. SM Pro Audio PR4V microphone preamp. Standard desktop mono PC speakers were used for sources. Audiotrak Maya44USB soundcard for 4 channel recording.

Room 3: Medium size office room. W3 W3 H2.5 meters
Scenario: The sources were placed randomly in the room, either on the floor or on a table or stool. With an average distance of 1.5 meters between any two sources. The Microphones were placed at different heights, uniformly spaced with approx. 1 meter.
Equipment: Behringer ECM8000 omnidirectional microphones. SM Pro Audio PR4V microphone preamp. Standard desktop mono PC speakers were used for sources. Audiotrak Maya44USB soundcard for 4 channel recording.

Room 4: Chamber w. cushion walls. W3.55 W4.45 H2.5 meters
Scenario: The all four microphones were placed 3-dimensionally around the center of the room with height around 125 cm. The maximum distance between any two microphones was 5.7 cm. The first three sources were placed at the same height with the microphones. The last source was placed at a different height. Source distances from microphones: around 100 cm
Equipment: Sony ECM-77B omnidirectional microphones. Yamaha HA8 microphone preamp. Bose 101MM speakers with 1705II power amplifier were used for sources. Dasbox model-500 for A/D and D/A converters.

Room 5: Same as room 4
Scenario: The all four microphones were placed 3-dimensioanlly around the center of the room with height around 125 cm. The maximum distance between any two microphones was 5.7 cm. The first three sources were placed at the same height with the microphones. The last source was placed at a different height. Source distances from microphones: around 180 cm
Equipment: Sony ECM-77B omnidirectional microphones. Yamaha HA8 microphone preamp. Bose 101MM speakers with 1705II power amplifier were used for sources. Dasbox model-500 for A/D and D/A converters.

Room C: Same as room 2.
Scenario: Similar to room 2. The exact placement of the microphones and of the sources, and their amplitudes, are different though.
Equipment: Behringer ECM8000 omnidirectional microphones. SM Pro Audio PR4V microphone preamp. Standard desktop mono PC speakers were used for sources. Audiotrak Maya44USB soundcard for 4 channel recording.

Room O: Same as room 3.
Scenario: Similar to room 3. The exact placement of the microphones and of the sources, and their amplitudes, are different though.
Equipment: Behringer ECM8000 omnidirectional microphones. SM Pro Audio PR4V microphone preamp. Standard desktop mono PC speakers were used for sources. Audiotrak Maya44USB soundcard for 4 channel recording.

The dataset "hild" contains 1 stereo recording of 2 speech sources. This recording is a stereo WAV audio file named iliad_mix.wav, that can be imported in Matlab using the wavread command. The author of the dataset is Kenneth Hild.

Room info: 3.7 x 4.4 m (lab containing several desks and chairs; the instantaneous power of the impulse response decayed by 35 dB relative to the peak power in 270 ms)
Scenario: The signals representing both speakers are actually the same person quoting from different parts of the epic Iliad, as translated by Samuel Butler. The microphones are placed on either side of a dummy head. Distance from speakers to center of head (microphone array) is 161 and 137 cm.
Equipment: Studio Projects B3 pressure gradient transducer microphone, with cardiod pickup pattern. Audio Buddy 2-channel preamplifier with phantom power.

Tasks

Based on the outcomes of the panel discussion at ICA'07, the source separation problem has been split into three tasks:
  1. source counting (estimate the number of sources)
  2. source signal estimation (estimate the mono source signals)
  3. source spatial image estimation (estimate the contribution of each source to all mixture channels)
In practice, reference mono source signals are not available. So the results of task 2 will be evaluated w.r.t. the contribution of each source to the first mixture channel.

Submission

Each participant is asked to submit the results of his/her algorithm for tasks 2 or 3 as preferred over all or part of the two test sets. Algorithms using a limited number of mixture channels (e.g. only the first two channels) are welcome.

The results for task 1 may also be submitted if possible. They will help diagnosing the performance of various parts of the algorithm when available.

In addition, each participant is asked to provide basic information about his/her algorithm (e.g. number of channels used, bibliographical reference) and to declare its average running time, expressed in seconds per test excerpt and per GHz of CPU.

Note that the submitted audio files will be made available on a website under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 2.0 (external link) license.

Evaluation criteria

We plan to use the same evaluation criteria as for the under-determined speech and music mixtures dataset, so that results are comparable.

The estimated source signals will be evaluated via the criteria defined in the BSS_EVAL (external link) toolbox. These criteria allow an arbitrary filtering between the estimated source and the true source and measure inteference and artifacts distortion separately. All source orderings are tested and the ordering leading to the best SIR is selected.

Similarly, the estimated spatial source image signals will be evaluated via the criteria used for the Stereo Audio Source Separation Evaluation Campaign (external link). These criteria distinguish spatial (or filtering) distortion, interference and artifacts. All source orderings are tested and the ordering leading to the best SIR is selected.

The above performance criteria are respectively implemented in
Note: the computation of these criteria may take some time, due to the need of computing the best ordering and the actual filter distortion between the estimated sources and the true sources.

Potential participants

If you might consider participating, please add your name and email address here and sign up (external link) for the mailing list to receive further announcements
  • Lucas Parra (parra (a) ccny_cuny_edu)
  • Kenneth E. Hild II (k.hild (a) ieee_org)
  • Robert Johnson (rjohnson (a) fmrib.ox.ac.uk)

Task proposed by: Lucas Parra, Ken Hild, Emmanuel Vincent

Menu