textffmpegsubtitlewebvttdvb

How can i transform dvb subtitles into text format using FFMpeg within a live streaming or how can i optimize the dvb burning process?


I am working on a hls transcoder from any format to HLS and I need to encode multiple subtitles with the format "dvbsub" at the same time with the purpose of being selected by a client who interprets the m3u8 HLS playlist.

The main problem is that burning each dvbsub into a live video stream in this way:

 "-filter_complex "[0:v][0:s:0]overlay[v0];[0:v][0:s:1]overlay[v1];[0:v][0:s:2]overlay[v2];......"

is a very CPU intensive task. (I have 8 or more dvbsub in the same stream).

Does Anyone know how to transform each dvbsub into a text format (webvtt for example) or if there is a way to optimize the process? (I tried to perform this burning process with NVIDIA gpu but I have not achieved any improvement)

I read about OCR programs which can do the task but after days of research i still dont know how to do that.

Thanks in advance.

EDIT: The input is a live UDP signal. I need to do the transformation on the fly.


Solution

  • With ccextractor (https://github.com/CCExtractor/ccextractor) you can extract dvbsub and dvb_teletext subtitles.

    To extract dvbsubs you will need to compile ccextractor with OCR support.

    Install dependencies:

    $ sudo apt-get update
    $ sudo apt-get install tesseract-ocr-dev
    $ sudo apt-get install tessercat-ocr-*
    $ sudo apt-get install -y gcc
    $ sudo apt-get install -y libcurl4-gnutls-dev
    $ sudo apt-get install -y libleptonica-dev
    

    In ccextractor code:

    $ mkdir build && cd build
    $ cmake -DWITH_OCR=ON ../src/ 
    $ make -j4
    

    Stream your content by udp (-map 0:18 is getting only dvbsub content from multiplex) :

    $ ffmpeg -re -i mux562.ts -map 0:18 -c:s dvbsub -f mpegts udp://239.0.0.1:5000
    

    Read your udp stream live and get srt output:

    $ ccextractor -s -codec dvbsub -in=ts -udp 239.0.0.1:5000 -o output.srt
    

    You can write srt output to FIFO or to stdout, please refer to ccextractor help