I accidentally forgot to convert some NumPy arrays to bytes objects when using PyAudio, but to my surprise it still played audio, even if it sounded a bit off. I wrote a little test script (see below) for playing 1 second of a 440Hz tone, and it seems like writing a NumPy array directly to a PyAudio Stream
cuts that tone short.
Can anyone explain why this happens? I thought a NumPy array was a contiguous sequence of bytes with some header information about its dtype and strides, so I would've predicted that PyAudio played the full second of the tone after some garbled audio from the header, not cut the tone off.
# script segment
import pyaudio
import numpy as np
RATE = 48000
p = pyaudio.PyAudio()
stream = p.open(format = pyaudio.paFloat32, channels = 1, rate = RATE, output = True)
TONE = 440
SECONDS = 1
t = np.arange(0, 2*np.pi*TONE*SECONDS, 2*np.pi*TONE/RATE)
sina = np.sin(t).astype(np.float32)
sinb = sina.tobytes()
# console commands segment
stream.write(sinb) # bytes object plays 1 second of 440Hz tone
stream.write(sina) # still plays 440Hz tone, but noticeably shorter than 1 second
The problem is more subtle than you describe. Your first call is passing a bytes array of size 192,000. The second call is passing a list of float32 values with size 48,000. pyaudio
handles both of them, and passes the buffer to portaudio
to be played.
However, when you opened pyaudio
, you told it you were sending paFloat32
data, which has 4 bytes per sample. The pyaudio write
handler takes the length of the array you gave it, and divides by the number of channels times the sample size to determine how many audio samples there are. In your second call, the length of the array is 48,000, which it divides by 4, and thereby tells portaudio
"there are 12,000 samples here".
So, everyone understood the format, but were confused about the size. If you change the second call to
stream.write(sina, 48000)
then no one has to guess, and it works perfectly fine.