Under-determined speech and music mixtures

le jeu. 10 de juill., 2008 21:29 CEST, par mcobos

Dear All,

In order to support the comments of Emmanuel Vincent I am going to have my say and give my point of view about the proposed evaluation.

I think that the performance evaluation measures suggested by Vincent are very interesting for several reasons:

1.- They provide an intuitive way of the performance regarding different aspects (artifacts, spatial fidelity, rejection to interference...)
2.- They seem to be broadly accepted by the separation community.
3.- The comparison with the results obtained in the First Stereo ASS Evaluation Campaign is straightforward.

However, it is true that other evaluation measures have already been proposed and have been shown to provide a good correlation with the perceived sound quality. This is the case of the ideal binary mask, mostly used by the CASA community:

1. The percentage of recovered energy of the ideal mask. The higher this percentage, the more energy of the original signal is recovered and the speech intelligibility of the desired source increases.

2. The percentage of false estimated bins denotes the relative number of bins that are wrongly assigned to the preferred source. According to the ideal masks, these bins should be assigned to one of the other sources of the auditory scene, as the absolute value of energy contribution to this bin of another source is larger than the energy contribution of the desired source. The lower this value, the less artifacts from other speech sources are contained in the estimated mask.

3. The percentage of correct estimated bins clarifies how much of the estimated bins are correctly assigned to the source of interest. The gap between this value and the percentage of false estimated bins indicates the number of those TF-bins that happen to have high energy in the recorded mixture, but none of the ideal masks of the single sources exhibit high energy in this TF-region. So these bins are likely to occur from reverberations and cannot be assigned to a specific source.

So, with this comment I just want to know what do you think about using these kind of measures. Obviously, one of the problems is how to compare algorithms that are not directly using masks or binary masks for carrying out the separation. In addition, it depends on the kind of the TF analysis used in the separation. However, if this comment is useful for suggesting you any idea about this kind of evaluation I will completely satisfied.

Maximo Cobos

le lun. 21 de juill., 2008 23:36 CEST, par mcobos

Dear all,

In relation with my last post about using performance measures based on the ideal binary mask (IBM), I had thought about these two possibilities:

1- The first one is to obtain the ISR, SIR, SAR for the sources separated using the IBM and compare them with the ones obtained by the different submitted algorithms. This would allow to make a kind of comparison with the IBM even for algorithms not based on TF masking.

2- With the same philosophy of ISR, SIR, SAR, decompose the separated source taking into account the IBM source. I had thought about considering the IBM separated source as the target signal in the decomposition. However, it is true that separation algorithms perceptually better than IBM would get a bad performance measure.

One more thing. Last year's campaign the computation times were provided by the participants. There were huge differences between the different algorithms, so knowing the machines used for running their algorithms was not critical. As the participants are the ones that best know how to run their algorithms with the right parameters, running all the algorithms in the same machine with their optimal configuration is very difficult. I commented on this issue with Emmanuel and he suggested including the CPU speed in the measure, for example (s/mix*Ghz). There would still be the problem of considering the way the algorithm was compiled, although I guess that most of the participants run their algorithms in Matlab.

So, do you have any idea about these issues?

Best regards,

Maximo Cobos

Under-determined speech and music mixtures

Results

Test data

Development data

Tasks and reference software

Submission

Evaluation criteria

Potential participants

Sidebar

Connexion

Menu