I'm trying to edit MP4 width & height without scaling.
I'm doing that by editing tkhd
& stsd
boxes of the MP4 header.
exiftool
will show the new width & height but ffprobe
will not.Before editing:
Exif: $ exiftool $f | egrep -i 'width|height'
Image Width : 100
Image Height : 100
Source Image Width : 100
Source Image Height : 100
FFprobe:
$ ffprobe -v quiet -show_streams $f | egrep 'width|height'
width=100
height=100
coded_width=100
coded_height=100
After editing the above sizes I then get this new following python file output:
[ftyp] size:32
[mdat] size:196933
[moov] size:2057
- [mvhd] size:108
- [trak] size:1941
- - [tkhd] size:92
Updated tkhd box: Width: 100 -> 300, Height: 100 -> 400
- - [mdia] size:1841
- - - [mdhd] size:32
- - - [hdlr] size:44
- - - [minf] size:1757
- - - - [vmhd] size:20
- - - - [dinf] size:36
- - - - - [dref] size:28
- - - - [stbl] size:1693
- - - - - [stsd] size:145
Updated stsd box #1: Width: 100 -> 300, Height: 100 -> 400
- - - - - [stts] size:512
- - - - - [stss] size:56
- - - - - [stsc] size:28
- - - - - [stsz] size:924
- - - - - [stco] size:20
Then running EXIFtool & FFprobe again:
$ exiftool $f egrep -i 'width|height'
Image Width : 300
Image Height : 400
Source Image Width : 300
Source Image Height : 400
$ ffprobe -v quiet -show_streams $f | egrep 'width|height'
width=100
height=100
coded_width=100
coded_height=100
This is my Python code:
import sys, struct
def read_box(f):
offset = f.tell()
header = f.read(8)
if len(header) < 8:
return None, offset
size, box_type = struct.unpack(">I4s", header)
box_type = box_type.decode("ascii")
if size == 1:
size = struct.unpack(">Q", f.read(8))[0]
elif size == 0:
size = None
return {"type": box_type, "size": size, "start_offset": offset}, offset
def edit_tkhd_box(f, box_start, new_width, new_height, depth):
f.seek(box_start + 84, 0) # Go to the width/height part in tkhd box
try:
old_width = struct.unpack('>I', f.read(4))[0] >> 16
old_height = struct.unpack('>I', f.read(4))[0] >> 16
f.seek(box_start + 84, 0) # Go back to write
f.write(struct.pack('>I', new_width << 16))
f.write(struct.pack('>I', new_height << 16))
print(f"{' ' * depth} Updated tkhd box: Width: {old_width} -> {new_width}, Height: {old_height} -> {new_height}")
except struct.error:
print(f" Error reading or writing width/height to tkhd box")
def edit_stsd_box(f, box_start, new_width, new_height, depth):
f.seek(box_start + 12, 0) # Skip to the entry count in stsd box
try:
entry_count = struct.unpack('>I', f.read(4))[0]
for i in range(entry_count):
entry_start = f.tell()
f.seek(entry_start + 4, 0) # Skip the entry size
format_type = f.read(4).decode("ascii", "ignore")
if format_type == "avc1":
f.seek(entry_start + 32, 0) # Adjust this based on format specifics
try:
old_width = struct.unpack('>H', f.read(2))[0]
old_height = struct.unpack('>H', f.read(2))[0]
f.seek(entry_start + 32, 0) # Go back to write
f.write(struct.pack('>H', new_width))
f.write(struct.pack('>H', new_height))
print(f"{' ' * depth} Updated stsd box #{i + 1}: Width: {old_width} -> {new_width}, Height: {old_height} -> {new_height}")
except struct.error:
print(f" Error reading or writing dimensions to avc1 format in entry {i + 1}")
else:
f.seek(entry_start + 8, 0) # Skip to the next entry
except struct.error:
print(f" Error reading or writing entries in stsd box")
def parse_and_edit_boxes(f, new_width, new_height, depth=0, parent_size=None):
while True:
current_pos = f.tell()
if parent_size is not None and current_pos >= parent_size:
break
box, box_start = read_box(f)
if not box:
break
box_type, box_size = box["type"], box["size"]
print(f'{"- " * depth}[{box_type}] size:{box_size}')
if box_type == "tkhd":
edit_tkhd_box(f, box_start, new_width, new_height, depth)
elif box_type == "stsd":
edit_stsd_box(f, box_start, new_width, new_height, depth)
# Recursively parse children if it's a container box
if box_type in ["moov", "trak", "mdia", "minf", "stbl", "dinf", "edts"]:
parse_and_edit_boxes(f, new_width, new_height, depth + 1, box_start + box_size)
if box_size is None:
f.seek(0, 2) # Move to the end of file
else:
f.seek(box_start + box_size, 0)
if __name__ == '__main__':
if len(sys.argv) != 4:
print("Usage: python script.py <input_file> <new_width> <new_height>")
else:
with open(sys.argv[1], 'r+b') as f:
parse_and_edit_boxes(f, int(sys.argv[2]), int(sys.argv[3]))
It seems related to ff_h264_decode_seq_parameter_set
FFprobe analyzes at stream level (eg: H.264) , but you are editing at the container level (eg: MP4).
You would need to edit the SPS (Sequence Parameter Settings) bytes.
Specifically you'll be editing: pic_width_in_mbs_minus1
and pic_height_in_map_units_minus1
.
Double-check the following using a hex editor. Try some manual editing first then write code to achieve same result of editing.
You need to also research how Golomb and Exp-Golomb codes (numbers) work. Because the information you need to edit is stored in this same bits formatting.
You can find the SPS bytes in the avcC
box, which is inside the MP4's stsd
box.
The avcC
has the following values (hex digits): 61 76 63 43
.
Keep going forward per byte until you hit an FF
(or 255) which is followed by E1
(or 225).
Now begins the SPS... two bytes for length, then SPS bytes themselves
(starts with byte 67
which means "SPS data").
Read this blog entry (Chinese) for more info.
note: If you use Chrome browser then you can get automatic page translation from Chinese into English.
The structure of SPS is shown in the images further below.
For example if your bytes look like: FF E1 00 19 67 42 C0 0D 9E 21 82 83
then...
FF E1
is where the SPS packet begins.00 19
is the SPS bytes length (hex 0x0019
is equal to decimal 25).0x67
signals that the actual SPS data begins here...0x42
here is set at decimal 66.You can see (from the below image) that Profile IDC uses 8 bits, and since an array slot holds 8 bits, this value will be the entire slot's value.
Next is C0
which is four 1-bit values and four reserved zeros. The total is 8 bits so this fills the next array slot as C0
(where C0
bits look like this: 1100 0000
).
constraint_set0_flag = 1
constraint setl_flag = 1
constraint_set2_flag = 0
constraint_set3 flag = 0
reserved_zero_4bits = 0 0 0 0
Next is 0D
which is the Level IDC.
Next is 9E
which is bits 1001 1110
. In the ue(v)
format if the first bit is a 1
then the answer == 0
, (eg: we stop any time a 1
bit is found, then the answer is how many 0
bits were counted before reaching this 1
bit)
seq_parameter_set_id = 0 (since first bit is a 1, we counted zero 0-bits to reach)
Here the IF statemement can be skipped since our Profile IDC is 66 (not 100 or more).
There are still 7 other bits left in that byte 0x9E
as ...001 1110
log2 max pic order cnt Isb minus4 = 3
Because we stop at any next 1
, we use the count of previous zeroes to read a bit-length of the data value. So here 001 11
is to be read as: 00 {1} 11
where that {1}
is the stop counting
signal. There are two zeroes (before 1
) so we know to read two bits after the 1
signal for stopping)
Hopefully it's enough to get you and other readers started. You must reach pic_width_in_mbs_minus1
.
The images of data structure of SPS: