parsingheaderhevc

Decoding a proprietary HEVC/MP4 stream


One of those time where I am just out of ideas and hoping for a saint.

I am currently trying to decode and use a proprietary video stream of an IP cam and I feel like I am very close but I just cannot find the last piece of the puzzle. The camera is set to 1 FPS, CBR and an I-Frame interval of 1 for maximum consistency.

Overview of what I currently do: Buffer packets, look for header of camera's propriatary protocol (35 Bytes), look for another / next one, flush everything in between out to a file (For the sake of the post, this is called a "segment"), rinse, repeat.

If I set the stream to a very low quality, that is 352*288 with a very low bitrate I can open and play back the resulting file in MPC absolutely fine (Or convert it with FFMPEG and then play it back in VLC).

But here comes the issue: By increasing the video quality more and more, after a certain point, the video starts to get corrupted. One thing that also starts to happen when this case occurs: The maximum "segment" that is found is capped at 8183 bytes (Quite a peculiar size I found as its very close to 2^13). So I decided to look into what actually gets written whenever a 8176 section is encountered and what I've found seems indeed very peculiar as well - Many of almost matching bytes! (These bytes are only written for the first 8176 segment of a frame)

Sample 1:

0000 0001 4001 0c01 ffff 0160 0000 0300 b000 0003 0000 0300 3cac 0900 0000 0142 0101 0160 0000 0300 b000 0003 0000 0300 3ca0 0b08 0485 8dae 4932 fcdc 0404 0402 0000 0001 4401 c0f2 f03c 9000 0000 014e 01e5 04cc cc00 0080 0000 0001 2601 af1b 686f 315f 8bcd 7007

Sample 2 (A couple of seconds later):

0000 0001 4001 0c01 ffff 0160 0000 0300 b000 0003 0000 0300 3cac 0900 0000 0142 0101 0160 0000 0300 b000 0003 0000 0300 3ca0 0b08 0485 8dae 4932 fcdc 0404 0402 0000 0001 4401 c0f2 f03c 9000 0000 014e 01e5 049b 9b00 0080 0000 0001 2601 af17 68c3 3d14 cf63 2cab

As you can see, up until the 8000 0000 0126 01af they seem to be some type of header for / by.. something. Edit: Seems like this part contains the VPS / SPS / PPS of the then following frame.

It almost seems like the demuxer just has no effing idea that the current frame has more data to it than the initial 8176 byte segment seeing how at a quality setting where one frame consists of one 8176 and one ~2000 byte segment the video is fine on the upper two thirds and only start to corrupt at the lower end. This "point of corruption" ofc moves up further and further as I increase the bitrate of the stream.

Why dont you just use a proper camera?!

This camera is actually fine.

Just use its normal RTSP stream then?

Well theres the issue on why I even started to do this - It only supports RTSP over UDP while this propriatary protocol runs over TCP, and if theres packetloss (Which there is) the RTSP stream will start to corrupt, which I am ofc trying to not have happen.

Hope theres somebody here who might be able to help me. If you need sample files or anything I'd be happy to provide them for any soul that is interested to try and help me.

Thanks!

Edit: Seems like I might be onto something. I've just downloaded the Trial version of Zond 265 (A HEVC analyzer), and when opening the resulting file in it, it errors for every frame with both "Unexpected remaining X bytes found" as well as "end_of_subset_one_bit shall be equal to 1". So if I just take those remaining bits and remove that amount of bits in front of it - both of those errors go away! (A new one appears tho, decode CTU #x: exception) The image is obviously however still corrupted as now theres missing information but at least its something to work off. Still not really an idea what the next step would be tho.


Solution

  • So I've managed to solve my issue, heres what I did. I've found a DVR software on some dodgy site that does support the very same protocol and managed to access the camera trough that. I then recorded the stream trough it, as well as trough my software and bindiffing the two results is what gave me the final click. Turns out that I was slicing off a couple of bytes too much from the header (Pretty much slicing into the videostream data), but not always. Occasionally (And on the very first frame) the response header seems to be 8 bytes longer than most of the time, this is indicated by the video stream starting with 00 00 01 FC. So by adding these 8 bytes that I've always sliced off into the stream, or cutting them out under that occasion, I get a non-corrupted video stream :)