pythonaudiopitchpitch-shifting

Algorithm and package to modify the pitch of the sound for certain durations repeatedly


I want to create an audio file using the existing audio file by which I can modify the pitch of the audio for different durations of the file. Like if the file is of 36sec then I want to modify the pitch for 1st 2 sec with some value then from 6th sec to 9th sec some other value and so on.

Basically, I am trying to modify the audio file based on the text message that user gives like say if user inputs "kill bill", according to each character in the message k,i,l,b... I have taken an array which stores different durations and like that I have the table for 26 alphabets a,b,c,d,... and so on. Based on these durations, I am trying to modify the file for these particular durations. The issue is that I don't really have a very good hands-on over the audio and I even tried dong the same in Java but unable to do so.

Is there some other parameter that could be changed in an audio file without making the change much noticeable?

I am referring to these values, although the code is in Java but just ignore that. I will transform that later in Python. Values are in milliseconds.

public static void convertMsgToAudio(String msg){

        int len = msg.length();
        duration = new double[len];
        msg = msg.toUpperCase();
        System.out.println("Msg 2 : " + msg);

        int i;
        //char ch;
        for(i=0;i<msg.length();i++){

            if(msg.charAt(i) == 'A'){
                duration[i] = 50000;
            }
            else if (msg.charAt(i) == 'B'){
                duration[i] = 100000; // value in milliseconds 
            }
            else if (msg.charAt(i) == 'C'){
                duration[i] = 150000;
            }
            else if (msg.charAt(i) == 'D'){
                duration[i] = 200000;               
            }
            else if (msg.charAt(i) == 'E'){
                duration[i] = 250000;
            }
            else if (msg.charAt(i) == 'F'){
                duration[i] = 300000;
            }
            else if (msg.charAt(i) == 'G'){
                duration[i] = 350000;
            }
            else if (msg.charAt(i) == 'H'){
                duration[i] = 400000;
            }
            else if (msg.charAt(i) == 'I'){
                duration[i] = 450000;
            }
            else if (msg.charAt(i) == 'J'){
                duration[i] = 500000;
            }
            else if (msg.charAt(i) == 'K'){
                duration[i] = 550000;
            }
            else if (msg.charAt(i) == 'L'){
                duration[i] = 600000;
            }
            else if (msg.charAt(i) == 'M'){
                duration[i] = 650000;
            }
            else if (msg.charAt(i) == 'N'){
                duration[i] = 700000;
            }
            else if (msg.charAt(i) == 'O'){
                duration[i] = 750000;
            }
            else if (msg.charAt(i) == 'P'){
                duration[i] = 800000;
            }
            else if (msg.charAt(i) == 'Q'){
                duration[i] = 850000;
            }
            else if (msg.charAt(i) == 'R'){
                duration[i] = 900000;
            }
            else if (msg.charAt(i) == 'S'){
                duration[i] = 950000;
            }
            else if (msg.charAt(i) == 'T'){
                duration[i] = 1000000;
            }
            else if (msg.charAt(i) == 'U'){
                duration[i] = 1100000;
            }
            else if (msg.charAt(i) == 'V'){
                duration[i] = 1200000;
            }
            else if (msg.charAt(i) == 'W'){
                duration[i] = 1300000;
            }
            else if (msg.charAt(i) == 'X'){
                duration[i] = 1400000;
            }
            else if (msg.charAt(i) == 'Y'){
                duration[i] = 1500000;
            }
            else if (msg.charAt(i) == 'Z'){
                duration[i] = 1600000;
            }

        }

    }

Now, I am trying to do the same in Python. I am very new to this concept but this is the first time I am facing issues with this concept.


Solution

  • A simple way is to work on raw PCM data directly; in this format the audio data is just a sequence of -32768...32767 values stored as 2 bytes per entry (assuming 16-bit signed, mono) sampled at regular intervals (e.g. 44100Hz).

    To alter the pitch you can just "read" this data faster e.g. at 45000Hz or 43000Hz and this is easily done with a resampling procedure. For example

     import struct
     data = open("pcm.raw", "rb").read()
     parsed = struct.unpack("%ih" % (len(data)//2), data)
     # Here parsed is an array of numbers
    
     pos = 0.0     # position in the source file
     speed = 1.0   # current read speed = original sampling speed
     result = []
    
     while pos < len(parsed)-1:
         # Compute a new sample (linear interpolation)
         ip = int(pos)
         v = int(parsed[ip] + (pos - ip)*(parsed[ip+1] - parsed[ip]))
         result.append(v)
    
         pos += speed     # Next position
         speed += 0.0001  # raise the pitch
    
     # write the result to disk
     open("out.raw", "wb").write(struct.pack("%ih" % len(result)), result)
    

    This is a very very simple approach to the problem, note however for example that increasing the pitch will shorten the length, to avoid this more sophisticated math is needed than just interpolating.

    I used approach this for example to raise by one tone a song over its length (I wanted to see if this was noticeable... it isn't).