pythonpyaudioloopbackportaudiopulseaudio

Read from audio output in PyAudio through loopbacks [Python record system output]


I'm writing a program that records from my speaker output using pyaudio. I am on a Raspberry Pi. I built the program while using the audio jack to play audio through some speakers, but recently have switched to using the speakers in my monitor, through HDMI. Suddenly, the program records silence.

from pyaudio import PyAudio


p = PyAudio()

print(p.get_default_input_device_info()['index'], '\n')
print(*[p.get_device_info_by_index(i) for i in range(p.get_device_count())], sep='\n\n')

The above code outputs first the index of the default input device of pyaudio, then the available devices. See the results below.

Case A:

2

{'index': 0, 'structVersion': 2, 'name': 'bcm2835 Headphones: - (hw:2,0)', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 8, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0016099773242630386, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}

{'index': 1, 'structVersion': 2, 'name': 'pulse', 'hostApi': 0, 'maxInputChannels': 32, 'maxOutputChannels': 32, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': 0.008684807256235827, 'defaultHighInputLatency': 0.034807256235827665, 'defaultHighOutputLatency': 0.034807256235827665, 'defaultSampleRate': 44100.0}

{'index': 2, 'structVersion': 2, 'name': 'default', 'hostApi': 0, 'maxInputChannels': 32, 'maxOutputChannels': 32, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': 0.008684807256235827, 'defaultHighInputLatency': 0.034807256235827665, 'defaultHighOutputLatency': 0.034807256235827665, 'defaultSampleRate': 44100.0}

If I then go into to terminal, enter sudo raspi-config and change the audio output to the headphone jack, I get an actual recording, not silence, and receive a different output to the above code.

Case B:

5

{'index': 0, 'structVersion': 2, 'name': 'vc4-hdmi-0: MAI PCM i2s-hifi-0 (hw:0,0)', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 2, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.005804988662131519, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}

{'index': 1, 'structVersion': 2, 'name': 'bcm2835 Headphones: - (hw:2,0)', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 8, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0016099773242630386, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}

{'index': 2, 'structVersion': 2, 'name': 'sysdefault', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 128, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.005804988662131519, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}

{'index': 3, 'structVersion': 2, 'name': 'hdmi', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 2, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.005804988662131519, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}

{'index': 4, 'structVersion': 2, 'name': 'pulse', 'hostApi': 0, 'maxInputChannels': 32, 'maxOutputChannels': 32, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': 0.008684807256235827, 'defaultHighInputLatency': 0.034807256235827665, 'defaultHighOutputLatency': 0.034807256235827665, 'defaultSampleRate': 44100.0}

{'index': 5, 'structVersion': 2, 'name': 'default', 'hostApi': 0, 'maxInputChannels': 32, 'maxOutputChannels': 32, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': 0.008684807256235827, 'defaultHighInputLatency': 0.034807256235827665, 'defaultHighOutputLatency': 0.034807256235827665, 'defaultSampleRate': 44100.0}

You can see in case B that I now have access to many different devices. I've attempted recording from all three available inputs in case A, and both #0 and #1 fail. #1 also records silence, and #0 returns OSError: [Errno -9998] Invalid number of channels. If you look closely at case A, you'll see that #0 has ['maxInputChannels'] = 0, so that's why.

I've attempted to create loopback devices that read from the sound output and introduce another input to pass the audio back in. I would then record from that input, as it would have input channels. I've researched on this thread here, but the only solution is for Windows.

I have also attempted to create a loopback device using the pulseaudio utility pactl. This link here demonstrates what I have tried. Upon succesfully creating a loopback, I'm unable to plug into it using pyaudio; it doesn't show up in the list of devices.

Does anybody know...

Thanks very much.


Solution

  • This problem took a while. Turns out, pyaudio is pretty useless for recording system audio, so I switched to pasimple, which has all of the benefits of pyaudio and, gasp, actually works. By benefits, I mean it is A) simple and B) has no dependencies. (In python. It does require pulseaudio).

    Below you will find my Recorder object. Keep in mind that I am on Raspbery Pi, so my means of finding the correct output device to listen in on may not work on other systems.

    pasimple works super well. Check out the documentation here. The tlength argument is worth looking into.

    import json
    import subprocess
    import wave
    from threading import Thread, Event
    
    import pasimple as pa
    
    
    class Recorder(Thread):
        def __init__(self) -> None:
            super().__init__()
            
            default_sink = subprocess.check_output('pactl get-default-sink', shell = True)
            
            self.device = '{}.monitor'.format(default_sink.decode().rstrip())
            
            devices = json.loads(subprocess.check_output('pactl --format="json" list sinks', shell = True))
            
            device = [device for device in devices if device['monitor_source'] == self.device][0]
            
            specs = device['sample_specification'].split()
            
            self.audio = {}
            
            self.audio['format'] = getattr(pa, 'PA_SAMPLE_{}'.format(specs[0].upper()))
            self.audio['channels'] = int(specs[1][:-2])
            self.audio['rate'] = int(specs[2][:-2])
            
            self.audio['sample-width'] = pa.format2width(self.audio['format'])
            
            self.is_recording = Event()
            self.kill = Event()
        
        def _get_sample_length(self, seconds: int) -> int:
            return self.audio['channels'] * self.audio['sample-width'] * self.audio['rate'] * seconds
        
        def _read_audio_data(self, seconds: int) -> bytes:
            return self.stream.read(self._get_sample_length(seconds))
        
        def record_to_file(self, file: str, seconds: int) -> None:
            data = self._read_audio_data(seconds)
            
            with wave.open(file, 'wb') as f:
                f.setnchannels(self.audio['channels'])
                f.setsampwidth(self.audio['sample-width'])
                f.setframerate(self.audio['rate'])
                
                f.writeframes(data)
        
        def run(self) -> None:
            self.stream = pa.PaSimple(
                direction = pa.PA_STREAM_RECORD,
                format = self.audio['format'],
                channels = self.audio['channels'],
                rate = self.audio['rate'],
                device_name = self.device,
                stream_name = 'thingamajiggy'
            )
            
            self.is_recording.set() # change state upon stream initialisation
            self.kill.wait() # await program end
            
            self.stream.flush() # release resources
            self.stream.close()
    
    
    if __name__ == "__main__":
        recorder = Recorder()
        recorder.start()
        
        recorder.is_recording.wait() # wait for stream to be established
        
        recorder.record_to_file('example.wav', 10)
        
        recorder.kill.set() # kill thread, free resources