I add subtitles to video by ffmpeg using two different ways. The first way is by using drawtext
command and this way everything works perfectly. Here is the command
ffmpeg -i ./input.mp4 -vf "drawtext=text='reise':fontfile=../fonts/Audiowide-Regular.ttf:fontsize=55:fontcolor=white:x=0:y=683" -codec:a copy ./output.mp4
The second way is by using ass subtitles file. This way I got smaller letters and wrong y position for text. Below is the ass subtitle file content
[Script Info]
Title: Advanced Highlighted Subtitle Example
ScriptType: v4.00+
WrapStyle: 0
PlayResX: 1048
PlayResY: 750
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Audiowide Regular,55,&HFFFFFF,&H00FFFFFF,&H00000000,&H00000000,1,0,0,0,100,100,0,0,0,0,0,2,10,10,10,1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,00:00:0.00,00:00:2.38,Default,,0,0,0,,{\pos(0,683)\an4}reise
And command for second approach
ffmpeg -i ./input.mp4 -vf "ass=../subtitles.ass:fontsdir=../fonts/Audiowide-Regular.ttf" ../output.mp4
So they both get the same video, same font and same text. The problem is that in case of using ass file the text is much smaller and dislocated
The numbers on axis indicate sizes by pixels. As you can see in second image it's much smaller and has wrong y coordinate. It seems like it has wrong scaling numbers. What is wrong with my ass file configs?
I have tried solution from [universalmediaserver]to remove PlayResX/Y, but it doesn't work. (https://www.universalmediaserver.com/forum/viewtopic.php?t=5907). I also tried to measure the text width in many other ways(like in html rendered in browser, canvas...), so I'm pretty sure that drawtext
does give correctly rendered width. The problem is related to ass subtitles file. Also if I use popular fonts like Arial the deviation is much less.
FFmpeg's drawtext
filter interprets text font size differently than ASS renderers (like Libass). The drawtext
filter uses the font's nominal size (in pixels), scaling it according to the units per EM. In contrast, ASS renderers use the font's real dimensions for scaling, which they determine by summing the ascender
field and minus value of descender
field from the font's tables (such as OS/2 and hhea).
So to match size between FFmpeg's drawtext
and ASS, we need need to find way for calculating font real dimension size (ASS's) from nominal size (drawtext
's). So let's firstly calculate base size of font that is then used for scaling it.
For nominal size, we need to read unitsPerEm
from Font Header Table, in case of Audiowide font it's 2048.
For real dimension size, we need to get ascender
and descender
fields value, that can be found in hhea table, in case of Audiowide font it's ascender is 2027 and descender is -584.
So then:
Nominal size = unitsPerEm
= 2048
Real dimension size = ascender
- descender
= 2027 - (-584) = 2611
So then real dimension size is bigger by some scale.
Scale = Real dimension size / Nominal size = 2611 / 2048 ≈ 1.279
So we need to multiply original font size (55) by the scale factor: 55 * 1.279 ≈ 70.345
Secondly, note that the drawtext filter uses different alignment than \an4 alignment tag in ASS that you used which corresponds to left-middle alignment. To match the positions, you should use \an7 (left-top alignment) in ASS.
Thirdly, drawtext
aligns text to the highest glyph (for historic reasons) instead of baseline plus ascent (how it's typically done), but you can change this by setting y_align=font
in the drawtext
filter.
So here is corrected ASSv4+ script file:
[Script Info]
Title: Advanced Highlighted Subtitle Example
ScriptType: v4.00+
WrapStyle: 0
PlayResX: 1048
PlayResY: 750
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Audiowide,70.345,&H00FFFFFF,&H00FFFFFF,&H00000000,&H00000000,-1,0,0,0,100,100,0,0,1,0,0,2,10,10,10,1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,00:00:0.00,00:00:2.38,Default,,0,0,0,,{\pos(0,683)\an7}reise
And corrected FFmpeg command:
ffmpeg -i ./input.mp4 -vf "drawtext=text='reise':fontfile=../fonts/Audiowide-Regular.ttf:fontsize=55:fontcolor=white:x=0:y=683:y_align=font" -codec:a copy ./output.mp4
And below is example how to read metric values from font by using Freetype (it's what drawtext
and ASS renderer Libass use under the hood for rendering fonts) in Python:
import freetype
face = freetype.Face('path/to/your/fontfile.ttf')
units_per_em = face.units_per_EM
ascender = face.ascender
descender = face.descender
print(f"Units per EM: {units_per_em}")
print(f"Ascender: {ascender}")
print(f"Descender: {descender}")
Result for Audiowide font should be:
Units per EM: 2048
Ascender: 2027
Descender: -584
And you can install needed library using:
python -m pip install freetype-py