pythonvoippjsua2

pjsua2 looped audio recording quality is horrible


I implemented a simple caller and listener script which works in the following way:

Notes:

The idea of the whole setup, in words, is: the caller plays an audio file and the listener loops back the audio to the caller, who finally stores the looped audio into an audio file. All I want is a simple audio loop, with no enhancements or changes to the original sound.

The issue is that the audio recorded on the caller is of horrible quality: the volume is overall lowered, the first few seconds have even lower volume than the rest, there is a slight echo throughout the recording, it fades in and out for seemingly no reason (echo cancellation?), the fading is quick and results in most of the recording being of such low volume that it is practically inaudible, with only about 4 instances of audible sound appearing with all its abysmal quality.

I find the official pjsip/pjsua2 documentation absolutely useless for anything beyond the simplest example. I've tried disabling VAD and making the script single-threaded and multi-threaded and finally I tried changing the EpConfig's MediaConfig.quality variable but this didn't help with anything.

The question is: What can I do to make sure that the original audio is transferred, looped-back and stored as-is, without any changes to the quality or characteristics of the recorded audio? This is a simple, short call which only loops a wav file, nothing more.

I tried changing some basic options available through the Python EpConfig interface. I tried switching up the playback and capture media devices.


Solution

  • I've solved the audio quality issue by not using the default media devices.

    Instead, I replaced the playback_device and capture_device on both ends with audio_device obtained as

    # Inside the onCallState() method of caller and listener
    # which derive from pj.Call class...
    
    ci = self.getInfo()
    
    if ci.state == pj.PJSIP_INV_STATE_CONFIRMED:
        for i in range(0, ci.media.size()):
            if ci.media[i].type == pj.PJMEDIA_TYPE_AUDIO:
                # Here we fetch the call's own audio media device...
                # We use this object to directly manipulate the call's media,
                # avoiding whatever nonsense the default media devices were doing, apparently...
                self.audio_media = self.getAudioMedia(i)
    
    # We insert the caller/listener device transmit code here...
    

    Now, the caller can

    self.audio_media.startTransmit(self.recorder)
    self.player.startTransmit(self.audio_media)
    

    where self.recorder is a pj.AudioMediaRecorder writing to the final output audio file and self.player is a pj.AudioMediaPlayer playing the reference audio file (this audio is sent to the listener, who loops it back to the caller).

    Then, the listener can

    self.audio_media.startTransmit(self.audio_media)
    

    to loop the audio back to the caller.

    The final output audio isn't completely perfect regardless of the "quality" setting or VAD, but it is good enough for my purpose. There is only a slight glitch at the start of the recording and the rest is of proper volume:

    enter image description here

    It seems that the pjsip system tries to reduce call latency by changing the speed of the playback, and the filters introduce some changes to the transmitted audio (+ some noise is present) but again, these don't degrade the quality too much and I can accept the way that it works now.