Now i'm trying to understand the rtp/rtcp protocol(RFC3550). I knew that in common case,the audio and video steaming is separately. But if i want to steaming a stored media file(such as *.mp4) in the server, how does the server get those tracks from that media file?
RTP is all about carrying the real time data, how you break it up and put it into a RTP packet payload (Called "Packetizing") is up to the implementer, but let's look at a common use case of how you'd actually do this.
If you wanted to send your existing recorded MP4 file through an RTP stream you'd first break it into smaller chunks to be sent down the wire at regular intervals packed inside RTP packets.
Let's say you've got a 10 second MP4 file and you decide your packetization timer is 1 second, we'd split it into 10x 1 second long chunks of data we can put into our RTP payloads. (In practice you could use FFMPeg or something similar to split the MP4 into 1 second chunks)
Then we form our RTP header, we set the Payload Type to something custom, as there's no payload type for MP4 data assigned by IANA. We'd assign a starting sequence number, a Synchronization Source Identifier and a timestamp, and then we'd fill the payload with the first 1 second of data.
1 second after that we'd increment the sequence number by 1, add 1 second to the timestamp, add the next 1 second of data to the payload and send the next RTP header.
We'd then repeat this 8 more times until we've sent 10 RTP packets containing our 10x 1 second MP4 payloads.
If you actually wanted to go about implementing this I wrote this simple Python Library for creating RTP packets,
To learn more about RTP there's obviously RFC 3550, for a really in depth look at RTP there's a great book by Colin Perkins called "RTP: Audio and Video for the Internet" and I've written a bit about all the RTP headers and their meaning.
In practice if you want to get a pre-recorded MP4 file from point A to point B there's better protocols for it than RTP, RTP is focused on the real time transfer of media, as in live-streaming style, not transferring existing pre-recorded media files, FTP, HTTP or even some of the peer-to-peer protocols would be better suited at transferring this.