[SOLVED] How to convert byte array to audio file?

How to convert byte array to audio file?

I have written a program that gets SIP packets in real time from the network and I want to use the SDP information embedded in the packets to capture the audio conversation from two VOIP soft phones.

Once I retrieve the binary data from the RTP protocol how should I go about converting it into a sound file?

c++ preferred.

Solution

Hi Adrian and welcome,

You are right, we cannot directly put the RTP payloads in a file concatenated one after another and then reading this file as an audio file, let's say a ".wav".

The missing part that you are looking for is a piece of code that re-assemble, decode and play-out the rtp flow of packets into voice samples; for the sake of simplicity, consider the wellknown G.711 or PCM codec because all SIP phone support this codec. You need to implement a Playout buffer (logically an infinite buffer but a ring buffer with wrap around is ok).

The packet itself contains audio data in small payload of 20ms duration. Each chunks of audio data is preceded with a RTP header, which indicates the type of encoding (This is related to the SDP information and you have a good understanding of that part).

For each packet:

Decode the 8-bits values into 16 bits samples at the right rate usually 8,000 times per second for G.711;
Compute from the RTP header the play-out point, it is the index in the play-out buffer array. Take into account jitter and re-ordering based on RTP timestamp
Write the samples into a .wav or play it to an audio device.

From a pragmatical point of view, you may do that in several ways:

You collect all the UDP/RTP packets in a capture file and use wireshark to do the hard work;
Use an existing tool, like playSIP A command-line SIP session recorder;
Grab a library or write existing code for that purpose but that is not an easy task. You can think about handling packet loss for instance.