I have written a program that gets SIP packets in real time from the network and I want to use the SDP information embedded in the packets to capture the audio conversation from two VOIP soft phones.
Once I retrieve the binary data from the RTP protocol how should I go about converting it into a sound file?
c++ preferred.
Hi Adrian and welcome,
You are right, we cannot directly put the RTP payloads in a file concatenated one after another and then reading this file as an audio file, let's say a ".wav"
.
The missing part that you are looking for is a piece of code that re-assemble, decode and play-out the rtp flow of packets into voice samples; for the sake of simplicity, consider the wellknown G.711
or PCM
codec because all SIP phone support this codec.
You need to implement a Playout buffer
(logically an infinite buffer but a ring buffer with wrap around is ok).
The packet itself contains audio data in small payload of 20ms duration. Each chunks of audio data is preceded with a RTP header, which indicates the type of encoding (This is related to the SDP information and you have a good understanding of that part).
For each packet:
Decode the 8-bits values into 16 bits samples at the right rate usually 8,000 times per second for G.711
;
Compute from the RTP header the play-out point, it is the index in the play-out buffer array. Take into account jitter and re-ordering based on RTP timestamp
Write the samples into a .wav
or play it to an audio device.
From a pragmatical point of view, you may do that in several ways:
wireshark
to do the hard work;