pythonvideomp4ffprobeexiftool

FFprobe not reflecting MP4 dimension edits


I'm trying to edit MP4 width & height without scaling.

I'm doing that by editing tkhd & stsd boxes of the MP4 header.

Before editing:

Exif:
$ exiftool $f | egrep -i 'width|height'

Image Width                     : 100
Image Height                    : 100
Source Image Width              : 100
Source Image Height             : 100

FFprobe:
$ ffprobe -v quiet -show_streams $f | egrep 'width|height'

width=100
height=100
coded_width=100
coded_height=100

After editing the above sizes I then get this new following python file output:

[ftyp] size:32
[mdat] size:196933
[moov] size:2057
- [mvhd] size:108
- [trak] size:1941
- - [tkhd] size:92
     Updated tkhd box: Width: 100 -> 300, Height: 100 -> 400
- - [mdia] size:1841
- - - [mdhd] size:32
- - - [hdlr] size:44
- - - [minf] size:1757
- - - - [vmhd] size:20
- - - - [dinf] size:36
- - - - - [dref] size:28
- - - - [stbl] size:1693
- - - - - [stsd] size:145
           Updated stsd box #1: Width: 100 -> 300, Height: 100 -> 400
- - - - - [stts] size:512
- - - - - [stss] size:56
- - - - - [stsc] size:28
- - - - - [stsz] size:924
- - - - - [stco] size:20

Then running EXIFtool & FFprobe again:

$ exiftool $f egrep -i 'width|height'

Image Width                     : 300
Image Height                    : 400
Source Image Width              : 300
Source Image Height             : 400

$ ffprobe -v quiet -show_streams $f | egrep 'width|height'

width=100
height=100
coded_width=100
coded_height=100

This is my Python code:

import sys, struct

def read_box(f):
    offset = f.tell()
    header = f.read(8)
    if len(header) < 8:
        return None, offset
    size, box_type = struct.unpack(">I4s", header)
    box_type = box_type.decode("ascii")
    if size == 1:
        size = struct.unpack(">Q", f.read(8))[0]
    elif size == 0:
        size = None
    return {"type": box_type, "size": size, "start_offset": offset}, offset

def edit_tkhd_box(f, box_start, new_width, new_height, depth):
    f.seek(box_start + 84, 0)  # Go to the width/height part in tkhd box
    try:
        old_width = struct.unpack('>I', f.read(4))[0] >> 16
        old_height = struct.unpack('>I', f.read(4))[0] >> 16
        f.seek(box_start + 84, 0)  # Go back to write
        f.write(struct.pack('>I', new_width << 16))
        f.write(struct.pack('>I', new_height << 16))
        print(f"{'  ' * depth} Updated tkhd box: Width: {old_width} -> {new_width}, Height: {old_height} -> {new_height}")
    except struct.error:
        print(f"  Error reading or writing width/height to tkhd box")

def edit_stsd_box(f, box_start, new_width, new_height, depth):
    f.seek(box_start + 12, 0)  # Skip to the entry count in stsd box
    try:
        entry_count = struct.unpack('>I', f.read(4))[0]
        for i in range(entry_count):
            entry_start = f.tell()
            f.seek(entry_start + 4, 0)  # Skip the entry size
            format_type = f.read(4).decode("ascii", "ignore")
            if format_type == "avc1":
                f.seek(entry_start + 32, 0)  # Adjust this based on format specifics
                try:
                    old_width = struct.unpack('>H', f.read(2))[0]
                    old_height = struct.unpack('>H', f.read(2))[0]
                    f.seek(entry_start + 32, 0)  # Go back to write
                    f.write(struct.pack('>H', new_width))
                    f.write(struct.pack('>H', new_height))
                    print(f"{'  ' * depth} Updated stsd box #{i + 1}: Width: {old_width} -> {new_width}, Height: {old_height} -> {new_height}")
                except struct.error:
                    print(f"  Error reading or writing dimensions to avc1 format in entry {i + 1}")
            else:
                f.seek(entry_start + 8, 0)  # Skip to the next entry
    except struct.error:
        print(f"  Error reading or writing entries in stsd box")

def parse_and_edit_boxes(f, new_width, new_height, depth=0, parent_size=None):
    while True:
        current_pos = f.tell()
        if parent_size is not None and current_pos >= parent_size:
            break
        box, box_start = read_box(f)
        if not box:
            break
        box_type, box_size = box["type"], box["size"]
        print(f'{"- " * depth}[{box_type}] size:{box_size}')
        
        if box_type == "tkhd":
            edit_tkhd_box(f, box_start, new_width, new_height, depth)
        elif box_type == "stsd":
            edit_stsd_box(f, box_start, new_width, new_height, depth)
        
        # Recursively parse children if it's a container box
        if box_type in ["moov", "trak", "mdia", "minf", "stbl", "dinf", "edts"]:
            parse_and_edit_boxes(f, new_width, new_height, depth + 1, box_start + box_size)
        
        if box_size is None:
            f.seek(0, 2)  # Move to the end of file
        else:
            f.seek(box_start + box_size, 0)

if __name__ == '__main__':
    if len(sys.argv) != 4:
        print("Usage: python script.py <input_file> <new_width> <new_height>")
    else:
        with open(sys.argv[1], 'r+b') as f:
            parse_and_edit_boxes(f, int(sys.argv[2]), int(sys.argv[3]))

It seems related to ff_h264_decode_seq_parameter_set


Solution

  • FFprobe analyzes at stream level (eg: H.264) , but you are editing at the container level (eg: MP4).

    You would need to edit the SPS (Sequence Parameter Settings) bytes.
    Specifically you'll be editing: pic_width_in_mbs_minus1 and pic_height_in_map_units_minus1.

    Double-check the following using a hex editor. Try some manual editing first then write code to achieve same result of editing.

    You need to also research how Golomb and Exp-Golomb codes (numbers) work. Because the information you need to edit is stored in this same bits formatting.

    The structure of SPS is shown in the images further below.

    For example if your bytes look like: FF E1 00 19 67 42 C0 0D 9E 21 82 83 then...

    You can see (from the below image) that Profile IDC uses 8 bits, and since an array slot holds 8 bits, this value will be the entire slot's value.

    Next is C0 which is four 1-bit values and four reserved zeros. The total is 8 bits so this fills the next array slot as C0 (where C0 bits look like this: 1100 0000).

    constraint_set0_flag = 1
    constraint setl_flag = 1
    constraint_set2_flag = 0
    constraint_set3 flag = 0
    reserved_zero_4bits  = 0 0 0 0
    

    Next is 0D which is the Level IDC.

    Next is 9E which is bits 1001 1110. In the ue(v) format if the first bit is a 1 then the answer == 0, (eg: we stop any time a 1 bit is found, then the answer is how many 0 bits were counted before reaching this 1 bit)

    seq_parameter_set_id = 0 (since first bit is a 1, we counted zero 0-bits to reach)
    

    Here the IF statemement can be skipped since our Profile IDC is 66 (not 100 or more).

    There are still 7 other bits left in that byte 0x9E as ...001 1110

    log2 max pic order cnt Isb minus4 = 3
    

    Because we stop at any next 1, we use the count of previous zeroes to read a bit-length of the data value. So here 001 11 is to be read as: 00 {1} 11 where that {1} is the stop counting signal. There are two zeroes (before 1) so we know to read two bits after the 1 signal for stopping)

    Hopefully it's enough to get you and other readers started. You must reach pic_width_in_mbs_minus1.

    The images of data structure of SPS:

    SPS image #1 SPS image #2