numpyaudiopyaudiopython-sounddevice

extracting numpy array from pyaudio results in noise and distortion


My goal is to record the audio output from my computer, process it in real time and then react to it in real time. I am using PyAudio with a patch that allows loopback devices. I can record the audio to a file with no problem and it sound perfect, but when I try to turn it into a numpy array to process it it's mostly noise and you can hear the signal extremely distorted.

This is the minimum example that produces the noisy output. In this case the input is the microphone to cut the input selection code but the result is the same: an extremely noisy representation of my input.

import numpy as np
import pyaudiowpatch as pyaudio
import wave
import sounddevice as sd
from matplotlib import pyplot as plt

FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1048
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "file.wav"


p = pyaudio.PyAudio()


stream = p.open(format=FORMAT, channels=CHANNELS,
                    rate=RATE, input=True,
                    frames_per_buffer=CHUNK)
print("recording...")
frames = []
fulldata = np.empty(RATE*RECORD_SECONDS*CHANNELS)

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    print(i)
    data = stream.read(CHUNK)
    numpydata = np.frombuffer(data, dtype='int16')
    fulldata[i*CHUNK*CHANNELS:(i+1)*CHUNK*CHANNELS] = numpydata
    frames.append(data)
print("finished recording")
channel0 = fulldata[0::CHANNELS]

print('playing')
sd.play(channel0, samplerate=RATE, blocking=True)


# stop Recording
stream.stop_stream()
stream.close()
p.terminate()

waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(p.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()

to get to this point I followed other questions for getting numpy data out of pyaudio and the recorded file sounds perfekt, so everything until the conversion seems to work. The input has two channels and I am not sure how the data is structured so I assumed that the data is alternating the channels which was also suggested in a different question where the channel0 code was provided. I assume somewhere between getting the data and playing it with sd there is a problem. I even tried to save it to a file first and then loading it again with librosa and that sounded perfect when playing it with sd. I did notice tho that on recording the sample rate is 48000 but when loading the wav file the sr is 22050. I also tried to look into the writeframes function for any clue about this but without luck. The p.get_sample_size(FORMAT) is two and since the waveFile expects it to be provided but np doesn't know it I thought that might cause some problem but I have no idea how to investigate this.

I just searched some more on stack and there have been several questions about converting to np and they all say it the channels are alternating and this solution should work. Does it maybe have to do with the latency? But I don't really know how to set it.

Is there a different way to convert it to a np array or am I missing something else here? Any help with this is greatly appreciated.


Solution

  • The issue was in

    fulldata = np.empty(RATE*RECORD_SECONDS*CHANNELS)
    

    where I didn't specify the datatype and therefore numpy created a float64 array and probably did some weird casting that led to the distortion. I changed it to int16 and it now works.