pythonweb-scrapingwebsocketjava-websocketfmp4

Convert fragmented MP4 to MP4


I am trying to scrape video frames from trafficview.org and can't seem to figure out how to decode the data.

I wrote a few lines of code based on tutorials on this websocket_client to access a live streaming websocket and receive the messages directly.

I have monitored the messages coming in via the network tab on Chrome and also dug into the output from the code below and am fairly certain the data are streaming in as a fragmented MP4. Below are the first 100 or so byte/messages:

b'\xfa\x00\x02\x86\xf1B\xc0\x1e\x00\x00\x00\x18ftypiso5\x00\x00\x02\x00iso6mp41\x00\x00\x02jmoov\x00\x00\x00lmvhd\x00\x00\x00\x00\xdb\x7f\xeb\xb2\xdb\x7f\xeb\xb2\x00\x00\x03\xe8\x00\x00\x00\x00\x00\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

Throughout this output, there are lots of moof and mdat pairs. Lets say I let this code run for 30 seconds, how can I convert this raw byte string into an mp4 file?

import json

from websocket import create_connection

url = 'wss://cctv.trafficview.org:8420/DDOT_CAPTOP_13.vod?progressive'

headers = json.dumps({
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'Cache-Control': 'no-cache',
    'Connection': 'Upgrade',
    'Host': 'cctv.trafficview.org:8420',
    'Origin': 'https://trafficview.org',
    'Pragma': 'no-cache',
    'Sec-WebSocket-Extensions': 'permessage-deflate; client_max_window_bits',
    'Sec-WebSocket-Key': 'FzWbrsoHFsJWzvWGJ04ffw==',
    'Sec-WebSocket-Version': '13',
    'Upgrade': 'websocket',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36',
})

ws = create_connection(url, headers=headers)

# Then send a message through the tunnel
ws.send('ping')

# Here you will view the message return from the tunnel
flag = 3000
output = b''
while flag > 0:
    output += ws.recv()
    flag -= 1

Update: I have adapted some code on stack-overflow to supposedly pipe in the fmp4 data and convert it to frames. To get there, I noticed that the first 16 bytes of the output from the websocket are not consistent with other mp4 files that I have inspected. So I first trim the first 16 bytes. I also don't know how one of these files are suppose to end, so I trim off up to the last moof of the file.

The code below can read the mp4 header fine (also below) but fails to decode any of the bytes.

output = output[8:]

import re
moof_locs = [m.start() for m in re.finditer(b'moof', output)]

output = output[:moof_locs[-1]-1]

import subprocess as sp
import shlex

width, height = 640, 480

# FFmpeg input PIPE: WebM encoded data as stream of bytes.
# FFmpeg output PIPE: decoded video frames in BGR format.
process = sp.Popen(shlex.split('/usr/bin/ffmpeg -i pipe: -f hls -hls_segment_type fmp4 -c h264 -an -sn pipe:'), stdin=sp.PIPE, stdout=sp.PIPE, bufsize=10**8)
process.stdin.write(output)
process.stdin.close()
in_bytes = process.stdout.read(width * height * 3)
in_frame = (np.frombuffer(in_bytes, np.uint8).reshape([height, width, 3]))

Output from ffmpeg:

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x994600] Could not find codec parameters for stream 0 (Video: h264 (avc1 / 0x31637661), none, 640x480): unspecified pixel format
Consider increasing the value for the 'analyzeduration' and 'probesize' options
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'pipe:':
  Metadata:
    major_brand     : iso5
    minor_version   : 512
    compatible_brands: iso6mp41
    creation_time   : 2020-09-11T13:40:21.000000Z
  Duration: N/A, bitrate: N/A
    Stream #0:0(und): Video: h264 (avc1 / 0x31637661), none, 640x480, 1k tbr, 1k tbn, 2k tbc (default)
    Metadata:
      creation_time   : 2020-09-11T13:40:21.000000Z
      encoder         : EvoStream Media Server
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
Finishing stream 0:0 without any data written to it.
Nothing was written into output file 0 (pipe:), because at least one of its streams received no packets.
frame=    0 fps=0.0 q=0.0 Lsize=       0kB time=-577014:32:22.77 bitrate=  -0.0kbits/s speed=N/A    
video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used)

Update 2:

Upon inspecting the stream coming in from the websocket, I realized that every message starts with a particular integer which is defined in the javascript code from trafficview. The order of these codes are ALWAYS the same, they come in like the following:

Header MOOV (250)
    PBT Begin (249)
        Video Buffer (252)
        Header MOOF (251)
        Header MOOF (251)
        Header MOOF (251)
        Header MDAT (254)
    PBT End (255)

    PBT Begin (249)
    Continues Forever

Some of these tags are always the same, for example the 249 messages are always f900 0000 and the 255 messages are always ff00 0000.

I am guessing that 249 and 255 messages are not normally in a fragmented mp4 or hls stream, and so I think I need to use this tag information to build up the correct file format from scratch.


Solution

  • ws = create_connection(url, headers=headers)
    # Then send a message through the tunnel
    ws.send('ping')
    
    start = timeit.default_timer()
    flag = True
    output = []
    while flag:
        output.append(ws.recv())
        if timeit.default_timer() - start > 90:
            flag = False
    
    result = output[0][8:]
    
    for msg in output[1:]:
        if msg[0] == 249:
            moofmdat = b''
            moof = b''
            continue
    
        if msg[0] == 252:
            vidbuf = msg[4:]
    
        if msg[0] == 251:
            moof += msg[4:]
    
        if msg[0] == 254:
            mdat = msg[4:]
    
        if msg[0] == 255:
            moofmdat += moof
            moofmdat += mdat
            moofmdat += vidbuf
            result += moofmdat
    
    with open('test.mp4', 'wb') as file:
        file.write(result)
    

    Figured it out. The MOOV header has 8 bytes of unnecessary information that must be removed. Each additional message (besides PBT_Begin and PBT_End) have 4 bytes of player specific data. Just needed to clean up each message and place in the correct order. Then save the raw bytes out as a mp4 and voila, video that plays in vlc.