audiomachine-learningpattern-recognitionaudio-processing

Recognize "ding-dong" sound


I'm building sound recognition model to detect "ding-dong" sound.

There are two procedures, training and testing.

Training data are "ding-dong" sounds generated by a device.

The model can detect "ding-dong" sounds generated by the same device, it works well.

But, when there is a new "ding-dong" sound generated by the second device, the performance will be bad.

I know the possible solution of this issue: record "ding-dong" sound generated by the second device and add it to training data.

But, there is always a new device, new "ding-dong" sound.

What should I do ?


Solution

  • You are facing overfitting problem. Overfitting means that your model has trained to work optimally on specific cases which are the training data set. In order to overcome this problem you should train your model on many devices and then make interpolation between them. Interpolation may be guaranteed by the model you are using.

    However, the previous information is so general. In your case, you may find much much easier way to do it. All is depend about how you define "ding-dong". If you could find a siguntur for the "ding-dong" it would be great. This signature should be invariant to all undesirable features.

    For example, should "Diiiiing-doooooong" be accepted? if yes, you should find a signature which invariant to length audio clip. Is "ding-dong" with higher frequency acceptable? If yes, you should find a signature which take frequencies as fraction of each other not as absolute values and so on...

    BTW, I am sure you may google this and find many many papers about your problem but it may be about "dang-dong" not "ding-dong" but you will still able to benefit from it ;)