signal-processingspeech-recognitiontelephonyaudio-processing

How to emulate telephone channel 8k speech given 16k microphone speech recording


I have a task of emulating 8k landline/cellular/VoIP speech audio given 16k microphone recording of that speech. What are the main stages for emulating it? I've found this torchaudio tutorial on such augmentation, and it is the most detailed instructions on how to do it.

Finaly I see following 16k mic -> 8k tel conversion pipeline:

  1. 16k -> 8k resampling
  2. Applying RIR (room impulse response to simulate reverberations) [OPTIONAL]
  3. Applying noise [OPTIONAL]
  4. Applying sox compand filter (is it needed? what other parameters might be used?)
  5. Apply codecs (GSM, g72*, SILK, OPUS, etc.)

What should be added? Equalization, some special filters, packet loss concealment emulation? May be there is existing Matlab scripts or libs for such augmentation?


Solution

  • Assuming you have a wave file

    
    from scipy.signal import lfilter, butter
    from scipy.io.wavfile import read,write
    from numpy import array, int16
    
    def butter_params(low_freq, high_freq, fs, order=5):
        nyq = 0.5 * fs
        low = low_freq / nyq
        high = high_freq / nyq
        b, a = butter(order, [low, high], btype='band')
        return b, a
    
    def butter_bandpass_filter(data, low_freq, high_freq, fs, order=5):
        b, a = butter_params(low_freq, high_freq, fs, order=order)
        y = lfilter(b, a, data)
        return y
    
    def apply_telephony_effect(f1, f2):
        fs,audio = read(f1)
        low_freq = 300.0
        high_freq = 3000.0
        filtered_signal = butter_bandpass_filter(audio, low_freq, high_freq, fs, order=6)
        write(f2,fs,array(filtered_signal,dtype=int16))
    

    you can create another

    apply_telephony_effect('input.wav', 'output.wav')
    

    The output will sound like telephone.