audiovoiprtpsdpjain-sip

How to convert byte array to audio file?


I have written a program that gets SIP packets in real time from the network and I want to use the SDP information embedded in the packets to capture the audio conversation from two VOIP soft phones.

Once I retrieve the binary data from the RTP protocol how should I go about converting it into a sound file?

c++ preferred.


Solution

  • Hi Adrian and welcome,

    You are right, we cannot directly put the RTP payloads in a file concatenated one after another and then reading this file as an audio file, let's say a ".wav".

    The missing part that you are looking for is a piece of code that re-assemble, decode and play-out the rtp flow of packets into voice samples; for the sake of simplicity, consider the wellknown G.711 or PCM codec because all SIP phone support this codec. You need to implement a Playout buffer (logically an infinite buffer but a ring buffer with wrap around is ok).

    The packet itself contains audio data in small payload of 20ms duration. Each chunks of audio data is preceded with a RTP header, which indicates the type of encoding (This is related to the SDP information and you have a good understanding of that part).

    For each packet:

    1. Decode the 8-bits values into 16 bits samples at the right rate usually 8,000 times per second for G.711;

    2. Compute from the RTP header the play-out point, it is the index in the play-out buffer array. Take into account jitter and re-ordering based on RTP timestamp

    3. Write the samples into a .wav or play it to an audio device.

    From a pragmatical point of view, you may do that in several ways: