I grab subtitle from movie screenshot. An example
It will grab
Hey, why don't we all just relax, huh?
It has no relation with subtitle. It is screenshot. Since it is a subtitle we know the font type size etc if this will make it easier to grab.
I know most of you will say PHP OCR library but since the background is always different, it looks like it won't work.
The background being different shouldn't be a problem, you can just use an image library to remove anything that isn't the text colour.
Here's a quick example that gives a decent idea of what I mean, it replaces any colour lower than #f5f5f5
with #000000
,
<?php
$im = imagecreatefromjpeg("img.jpg");
for ($x = imagesx($im); $x--;)
{
for ($y = imagesy($im); $y--;)
{
$rgb = imagecolorat($im, $x, $y);
if ((($rgb >> 16) & 0xFF) <= 245
&& (($rgb >> 8) & 0xFF) <= 245
&& ($rgb & 0xFF) <= 245)
{
$black = imagecolorallocate($im, 0, 0, 0);
imagesetpixel($im, $x, $y, $black);
}
}
}
header("Content-Type: image/jpeg");
imagejpeg($im);
Here's how the result looks:
You can probably chop most of the top part off since you know the subtitles will be at the bottom. Then just run it through an OCR library.
It's probably better to use an external OCR library or command line tool and call it from PHP. For external tools, there's tesseract and ocropus (I believe ocropus is sponsored by Google too).