Improving Generalization of Speech Separation in Real-World Scenarios:
Strategies in Simulation, Optimization, and Evaluation

In this appendix, we present the hyperparameters, training configurations, and the demos of models trained by different simulation and optimization methods:
(1) We present the hyperparameters and training configurations of different separation backbones
(2) We present the demos of separation models
(3) We present the source links of the real world speech cases for testing separation models with our proposed methods
We recommend to use the headphone to listen to these demos.

Training Hyperparameters and Configurations


As mentioned in the section 3.1 of the submission, we incorporate three separation backbones (ConvTasNet, DPRNN, and Sepformer) to evaluate the effectiveness of our proposed AC-SIM and multi-loss training paradigms. Since three backbones are trained with different hyperparameters and configurations in prior and original works, we attach this information below:

  1. ConvTasNet: we set N=512, L=32, B=128, H=512, Sc=128, P=3, X=8, R=3; the learning rate is 0.001
  2. Dual Path RNN: we set N=256, L=16, B=64, H=256, K (chunksize)= 250, LSTM Hidden Dimension=128; the learning rate is 0.001
  3. Sepformer: we set N=256, L=16, IntraT-InterT-N=2, Nintra=8, Ninter=8, Nhead=8, dffn=1024; the learning rate is 0.00001

For all models, we use the Adam optmizer with &beta=(0.9,0.99) and a scheduler, which halves the learning rate if the average SI-SDR performance over (D-All, D-NE, D-NR, D-ON) four vadlidation sets is not improved after three successive epochs. We also set the gradient clipping to limit the L2 norm of the graidens to 5.

Separation Demos on Real-World Cases (on the Sepformer backbone) (No Target in this section)


Mixture
Vanilla (Baseline) - Separation-1
AC-SIM (proposed) - Separation-1
AC-SIM-ML (proposed) - Separation-1
Vanilla (Baseline) - Separation-2
AC-SIM (proposed) - Separation-2
AC-SIM-ML (proposed) - Separation-2
Mixture
Vanilla (Baseline) - Separation-1
AC-SIM (proposed) - Separation-1
AC-SIM-ML (proposed) - Separation-1
Vanilla (Baseline) - Separation-2
AC-SIM (proposed) - Separation-2
AC-SIM-ML (proposed) - Separation-2
Mixture
Vanilla (Baseline) - Separation-1
AC-SIM (proposed) - Separation-1
AC-SIM-ML (proposed) - Separation-1
Vanilla (Baseline) - Separation-2
AC-SIM (proposed) - Separation-2
AC-SIM-ML (proposed) - Separation-2

Separation Demos on Synthetic Data (Noisy and Reverberant) (on the Sepformer backbone)


Mixture
Target-1
Vanilla (Baseline) - Separation-1
AC-SIM (proposed) - Separation-1
AC-SIM-ML (proposed) - Separation-1
Target-2
Vanilla (Baseline) - Separation-2
AC-SIM (proposed) - Separation-2
AC-SIM-ML (proposed) - Separation-2
Mixture
Target-1
Vanilla (Baseline) - Separation-1
AC-SIM (proposed) - Separation-1
AC-SIM-ML (proposed) - Separation-1
Target-2
Vanilla (Baseline) - Separation-2
AC-SIM (proposed) - Separation-2
AC-SIM-ML (proposed) - Separation-2
Mixture
Target-1
Vanilla (Baseline) - Separation-1
AC-SIM (proposed) - Separation-1
AC-SIM-ML (proposed) - Separation-1
Target-2
Vanilla (Baseline) - Separation-2
AC-SIM (proposed) - Separation-2
AC-SIM-ML (proposed) - Separation-2

Separation Demos on Synthetic Data (clean acoustic environment) (on the Sepformer backbone)


Mixture
Target-1
Vanilla (Baseline) - Separation-1
AC-SIM (proposed) - Separation-1
AC-SIM-ML (proposed) - Separation-1
Target-2
Vanilla (Baseline) - Separation-2
AC-SIM (proposed) - Separation-2
AC-SIM-ML (proposed) - Separation-2
Mixture
Target-1
Vanilla (Baseline) - Separation-1
AC-SIM (proposed) - Separation-1
AC-SIM-ML (proposed) - Separation-1
Target-2
Vanilla (Baseline) - Separation-2
AC-SIM (proposed) - Separation-2
AC-SIM-ML (proposed) - Separation-2
Mixture
Target-1
Vanilla (Baseline) - Separation-1
AC-SIM (proposed) - Separation-1
AC-SIM-ML (proposed) - Separation-1
Target-2
Vanilla (Baseline) - Separation-2
AC-SIM (proposed) - Separation-2
AC-SIM-ML (proposed) - Separation-2

Separation Demos on Synthetic Data (single-speaker) (on the Sepformer backbone)


Mixture
Target-1
Vanilla (Baseline) - Separation-1
AC-SIM (proposed) - Separation-1
AC-SIM-ML (proposed) - Separation-1
Target-2
Vanilla (Baseline) - Separation-2
AC-SIM (proposed) - Separation-2
AC-SIM-ML (proposed) - Separation-2
Mixture
Target-1
Vanilla (Baseline) - Separation-1
AC-SIM (proposed) - Separation-1
AC-SIM-ML (proposed) - Separation-1
Target-2
Vanilla (Baseline) - Separation-2
AC-SIM (proposed) - Separation-2
AC-SIM-ML (proposed) - Separation-2