I implemented a simple caller and listener script which works in the following way:
pj.AudioMediaPlayer
instance into playback_media
)capture_media
into playback_media
capture_media
into pj.AudioMediaRecorder
, outputting the looped sound into a wav fileNotes:
playback_media
is obtained as pj.Endpoint.instance().audDevManager().getPlaybackDevmedia()
capture_media
is obtained as pj.Endpoint.instance().audDevManager().getCaptureDevmedia()
The idea of the whole setup, in words, is: the caller plays an audio file and the listener loops back the audio to the caller, who finally stores the looped audio into an audio file. All I want is a simple audio loop, with no enhancements or changes to the original sound.
The issue is that the audio recorded on the caller is of horrible quality: the volume is overall lowered, the first few seconds have even lower volume than the rest, there is a slight echo throughout the recording, it fades in and out for seemingly no reason (echo cancellation?), the fading is quick and results in most of the recording being of such low volume that it is practically inaudible, with only about 4 instances of audible sound appearing with all its abysmal quality.
I find the official pjsip/pjsua2 documentation absolutely useless for anything beyond the simplest example. I've tried disabling VAD and making the script single-threaded and multi-threaded and finally I tried changing the EpConfig
's MediaConfig.quality
variable but this didn't help with anything.
The question is: What can I do to make sure that the original audio is transferred, looped-back and stored as-is, without any changes to the quality or characteristics of the recorded audio? This is a simple, short call which only loops a wav file, nothing more.
I tried changing some basic options available through the Python EpConfig
interface.
I tried switching up the playback and capture media devices.
I've solved the audio quality issue by not using the default media devices.
Instead, I replaced the playback_device
and capture_device
on both ends with audio_device
obtained as
# Inside the onCallState() method of caller and listener
# which derive from pj.Call class...
ci = self.getInfo()
if ci.state == pj.PJSIP_INV_STATE_CONFIRMED:
for i in range(0, ci.media.size()):
if ci.media[i].type == pj.PJMEDIA_TYPE_AUDIO:
# Here we fetch the call's own audio media device...
# We use this object to directly manipulate the call's media,
# avoiding whatever nonsense the default media devices were doing, apparently...
self.audio_media = self.getAudioMedia(i)
# We insert the caller/listener device transmit code here...
Now, the caller can
self.audio_media.startTransmit(self.recorder)
self.player.startTransmit(self.audio_media)
where self.recorder
is a pj.AudioMediaRecorder
writing to the final output audio file and self.player
is a pj.AudioMediaPlayer
playing the reference audio file (this audio is sent to the listener, who loops it back to the caller).
Then, the listener can
self.audio_media.startTransmit(self.audio_media)
to loop the audio back to the caller.
The final output audio isn't completely perfect regardless of the "quality" setting or VAD, but it is good enough for my purpose. There is only a slight glitch at the start of the recording and the rest is of proper volume:
It seems that the pjsip system tries to reduce call latency by changing the speed of the playback, and the filters introduce some changes to the transmitted audio (+ some noise is present) but again, these don't degrade the quality too much and I can accept the way that it works now.