pythonstringnumpyaudioformat-conversion

Audio data string format to numpy array


I am trying to convert audio sample rate (from 44100 to 22050) of a numpy.array with 88200 samples in which I have already done some process (such as add silence and convert to mono). I tried to convert this array with audioop.ratecv and it work, but it return a str instead of a numpy array and when I wrote those data with scipy.io.wavfile.write the result was half of the data are lost and the audio speed is twice as fast (instead of slower, at least that would make kinda sense). audio.ratecv works fine with str arrays such as wave.open returns, but I don't know how to process those, so I tried to convert from str to numpy with numpy.array2string(data) to pass this on ratecv and get correct results, and then convert again to numpy with numpy.fromstring(data, dtype) and now len of data is 8 samples. I think this is due to complication of formats, but I don't know how can I control it. I also haven't figure out what kind of format str does wave.open returns so I can force format on this one.

Here is this part of my code

def conv_sr(data, srold, fixSR, dType, chan = 1): 
    state = None
    width = 2 # numpy.int16
    print "data shape", data.shape, type(data[0]) # returns shape 88200, type int16
    fragments = numpy.array2string(data)
    print "new fragments len", len(fragments), "type", type(fragments) # return len 30 type str
    fragments_new, state = audioop.ratecv(fragments, width, chan, srold, fixSR, state)
    print "fragments", len(fragments_new), type(fragments_new[0]) # returns 16, type str
    data_to_return = numpy.fromstring(fragments_new, dtype=dType)
    return data_to_return

and I call it like this

data1 = numpy.array(data1, dtype=dType)
data_to_copy = numpy.append(data1, data2)
data_to_copy = _to_copy.sum(axis = 1) / chan
data_to_copy = data_to_copy.flatten() # because its mono

data_to_copy = conv_sr(data_to_copy, sr, fixSR, dType) #sr = 44100, fixSR = 22050

scipy.io.wavfile.write(filename, fixSR, data_to_copy)

Solution

  • After a bit more of research I found my mistake, it seems that 16 bit audio are made of two 8 bit 'cells', so the dtype I was putting on was false and that's why I had audio speed issue. I found the correct dtype here. So, in conv_sr def, I am passing a numpy array, convert it to data string, pass it to convert sample rate, converting again to numpy array for scipy.io.wavfile.write and finally, converting 2 8bits to 16 bit format

    def widthFinder(dType):
        try:
            b = str(dType)
            bits = int(b[-2:])
        except:
            b = str(dType)
            bits = int(b[-1:])
        width = bits/8
        return width
    
    def conv_sr(data, srold, fixSR, dType, chan = 1): 
        state = None
        width = widthFinder(dType)
        if width != 1 and width != 2 and width != 4:
            width = 2
        fragments = data.tobytes()
        fragments_new, state = audioop.ratecv(fragments, width, chan, srold, fixSR, state)
        fragments_dtype = numpy.dtype((numpy.int16, {'x':(numpy.int8,0), 'y':(numpy.int8,1)}))
        data_to_return = numpy.fromstring(fragments_new, dtype=fragments_dtype)
        data_to_return = data_to_return.astype(dType)
        return data_to_return
    

    If you find anything wrong, please feel free to correct me, I am still a learner