I'm trying to separate vocals from a song using a deep learning model. The output is not wrong, but some extra noises cause the signal to sound bad.
The following is 3 seconds of the output file where the noise exists (the areas with a rectangle are the noises):
How can I remove these noises from my output file? I can see that these parts have a different amplitude than the other parts of the songs I want. is there a way to filter the signal based on these amplitudes and only allow a specific amplitude range to exist in my signal?
thanks
UPDATE: Please look at the accepted answer and my code for the denoising algorithm that is working as expected!
'How can I remove these noises from my output file?
You could 'window' it out (multiply those parts of the signal with a step function at e.g. 0.001 for the noise, and at 1 for the signal). This would silence the noisy regions, and keep your regions of interest. It is however not generalisable - and will work only for a pre-specified audio segment, since the window will be fixed.
I can see that these parts have a different amplitude than the other parts of the songs I want. is there a way to filter the signal based on these amplitudes and only allow a specific amplitude range to exist in my signal
Here you could use two approaches 1) running-window to calculate energy (sum of X^{2} over N samples, where X is your audio signal) or 2) generate the Hilbert envelope for your signal, and smooth the envelope with a window of the appropriate length (perhaps 1-100's of milliseconds long). You can set a threshold based on either the energy or Hilbert envelope.