pythonpython-3.xinputpyaudio

How to record audio each time user presses a key?


How to indeterminately record user's audio, if and only if when the user press ctrl key and shut down the recording loop when the user press ctrl+c keys? So far based on some online examples build this script:

from pynput import keyboard
import time, os
import pyaudio
import wave
import sched
import sys
from playsound import playsound


CHUNK = 8192
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
WAVE_OUTPUT_FILENAME = "mic.wav"

p = pyaudio.PyAudio()
frames = []

def callback(in_data, frame_count, time_info, status):
    frames.append(in_data)
    return (in_data, pyaudio.paContinue)

class MyListener(keyboard.Listener):
    def __init__(self):
        super(MyListener, self).__init__(self.on_press, self.on_release)
        self.key_pressed = None
        self.wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
        self.wf.setnchannels(CHANNELS)
        self.wf.setsampwidth(p.get_sample_size(FORMAT))
        self.wf.setframerate(RATE)
    def on_press(self, key):

        try:
            if key.ctrl:
                self.key_pressed = True
            return True
        except AttributeError:
            sys.exit()





    def on_release(self, key):

        if key.ctrl:
            self.key_pressed = False
        return True


listener = MyListener()


listener.start()
started = False
stream = None

def recorder():
    global started, p, stream, frames

    while True:

        try:
            if listener.key_pressed and not started:
                # Start the recording
                try:
                    stream = p.open(format=FORMAT,
                                    channels=CHANNELS,
                                    rate=RATE,
                                    input=True,
                                    frames_per_buffer=CHUNK,
                                    stream_callback = callback)
                    print("Stream active:", stream.is_active())
                    started = True
                    print("start Stream")
                except KeyboardInterrupt:
                    print('\nRecording finished: ' + repr(WAVE_OUTPUT_FILENAME))
                    quit()

            elif not listener.key_pressed and started:

                print("Stop recording")
                listener.wf.writeframes(b''.join(frames))
                listener.wf.close()
                print("You should have a wav file in the current directory")
                print('-> Playing recorded sound...')
                playsound(str(os.getcwd())+'/mic.wav')
                os.system('python "/Users/user/rec.py"')

        except KeyboardInterrupt:
            print('\nRecording finished: ' + repr(WAVE_OUTPUT_FILENAME))
            quit()
        except AttributeError:
            quit()





print ("-> Press and hold the 'ctrl' key to record your audio")
print ("-> Release the 'ctrl' key to end recording")


recorder()

The problem is that it is really inefficient, for example the computer starts heating up. The only way I found to make the program keep running and recording different audio samples was with: os.system('python "/Users/user/rec.py"'). For finishing the program I tried to either catch the exception with:

except AttributeError:
       sys.exit()

or with the user input:

if key.ctrl_c:
   sys.exit()

Based on pyinput docs, I tried to make effective usage of the listeners. However, for this specific scenario which is the recommended way of using those listeners?


Solution

  • As to the primary concern of your computer seeming to work terribly hard, that's because you use a while loop to constantly check for when the record key is released. Within this loop, the computer will loop around as fast as it can without ever taking a break.

    A better solution is to use event driven programming where you let the OS inform you of events periodically, and check if you want to do anything when they happen. This may sound complicated, but fortunately pynput does most of the hard work for you.

    If you keep track of the state of the recording or playback, it is also fairly simple to start a new recording the next time a control key down event happens without needing the "hack" of calling an entire new process recursively for each new recording. The event loop inside the keyboard listener will keep on going until one of the callback functions returns False or raises self.stopException().

    I have created a simple listener class similar to your initial attempt that calls on a recorder or player instance (which I'll get to later) to start and stop. I also have to agree with Anwarvic that <ctl-c> is supposed to be reserved as an emergency way of stopping a script, so I have changed the stop command to the letter q.

    class listener(keyboard.Listener):
        def __init__(self, recorder, player):
            super().__init__(on_press = self.on_press, on_release = self.on_release)
            self.recorder = recorder
            self.player = player
        
        def on_press(self, key):
            if key is None: #unknown event
                pass
            elif isinstance(key, keyboard.Key): #special key event
                if key.ctrl and self.player.playing == 0:
                    self.recorder.start()
            elif isinstance(key, keyboard.KeyCode): #alphanumeric key event
                if key.char == 'q': #press q to quit
                    if self.recorder.recording:
                        self.recorder.stop()
                    return False #this is how you stop the listener thread
                if key.char == 'p' and not self.recorder.recording:
                    self.player.start()
                    
        def on_release(self, key):
            if key is None: #unknown event
                pass
            elif isinstance(key, keyboard.Key): #special key event
                if key.ctrl:
                    self.recorder.stop()
            elif isinstance(key, keyboard.KeyCode): #alphanumeric key event
                pass
    
    if __name__ == '__main__':
        r = recorder("mic.wav")
        p = player("mic.wav")
        l = listener(r, p)
        print('hold ctrl to record, press p to playback, press q to quit')
        l.start() #keyboard listener is a thread so we start it here
        l.join() #wait for the tread to terminate so the program doesn't instantly close
    

    With that structure, we then need a recorder class with a start and stop function which will not block (asynchronous) the listener thread from continuing to receive key events. The documentation for PyAudio gives a pretty good example for asynchronous output, so I simply applied it to an input. A little bit of re-arranging, and a flag to let our listener know when we're recording later, and we have a recorder class:

    class recorder:
        def __init__(self, 
                     wavfile, 
                     chunksize=8192, 
                     dataformat=pyaudio.paInt16, 
                     channels=2, 
                     rate=44100):
            self.filename = wavfile
            self.chunksize = chunksize
            self.dataformat = dataformat
            self.channels = channels
            self.rate = rate
            self.recording = False
            self.pa = pyaudio.PyAudio()
    
        def start(self):
            #we call start and stop from the keyboard listener, so we use the asynchronous 
            # version of pyaudio streaming. The keyboard listener must regain control to 
            # begin listening again for the key release.
            if not self.recording:
                self.wf = wave.open(self.filename, 'wb')
                self.wf.setnchannels(self.channels)
                self.wf.setsampwidth(self.pa.get_sample_size(self.dataformat))
                self.wf.setframerate(self.rate)
                
                def callback(in_data, frame_count, time_info, status):
                    #file write should be able to keep up with audio data stream (about 1378 Kbps)
                    self.wf.writeframes(in_data) 
                    return (in_data, pyaudio.paContinue)
                
                self.stream = self.pa.open(format = self.dataformat,
                                           channels = self.channels,
                                           rate = self.rate,
                                           input = True,
                                           stream_callback = callback)
                self.stream.start_stream()
                self.recording = True
                print('recording started')
        
        def stop(self):
            if self.recording:         
                self.stream.stop_stream()
                self.stream.close()
                self.wf.close()
                
                self.recording = False
                print('recording finished')
    

    Finally, we create an audio player for audio playback when you press p. I threw the PyAudio example into a thread which is created everytime you press the button so that multiple players could be created which overlap eachother. We also keep track of how many players are playing so we don't try to record while the file is already in use by a player. (I also have included my imports at the top)

    from threading import Thread, Lock
    from pynput import keyboard
    import pyaudio
    import wave
    
    class player:
        def __init__(self, wavfile):
            self.wavfile = wavfile
            self.playing = 0 #flag so we don't try to record while the wav file is in use
            self.lock = Lock() #muutex so incrementing and decrementing self.playing is safe
        
        #contents of the run function are processed in another thread so we use the blocking
        # version of pyaudio play file example: http://people.csail.mit.edu/hubert/pyaudio/#play-wave-example
        def run(self):
            with self.lock:
                self.playing += 1
            with wave.open(self.wavfile, 'rb') as wf:
                p = pyaudio.PyAudio()
                stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
                                channels=wf.getnchannels(),
                                rate=wf.getframerate(),
                                output=True)
                data = wf.readframes(8192)
                while data != b'':
                    stream.write(data)
                    data = wf.readframes(8192)
    
                stream.stop_stream()
                stream.close()
                p.terminate()
                wf.close()
            with self.lock:
                self.playing -= 1
            
        def start(self):
            Thread(target=self.run).start()
    

    I can't guarantee this is perfectly free of bugs, but if you have any questions on how it works / how to get it working, feel free to comment.