ffmpegvideo-streaminghttp-live-streamingcaptionvideo-subtitles

How can I extract "GROUP-ID", "LANGUAGE" and "INSTREAM-ID" from ts using ffmpeg


I am trying to get the captions from a segment from a live feed. I am running the command

ffmpeg -i seg-1077853030-v1-a1.ts

Output

`Input #0, mpegts, from 'seg-109853030-v1-a1.ts': Duration: 00:00:06.01, start: 57867.901133, bitrate: 2649 kb/s Program 1

Stream #0:0[0x100]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(tv, bt709, progressive), 1280x720 [SAR 1:1 DAR 16:9], Closed Captions, 29.97 fps, 29.97 tbr, 90k tbn, 59.94 tbc
Stream #0:1[0x101]: Audio: aac (LC) ([15][0][0][0] / 0x000F), 48000 Hz, stereo, fltp, 98 kb/s
Stream #0:2[0x102]: Data: timed_id3 (ID3  / 0x20334449)`

My question is what command should I run to print out the caption file with the track metadata including label and language.


Solution

  • If your MPEG-TS file is a HLS segment then just parse the HLS master playlist to retrieve the values. If your input is captured from a live broadcast then read on.

    1. GROUP-ID

    It's up to you to set this value in the HLS playlist to indicate the rendition's group.

    See: https://www.rfc-editor.org/rfc/rfc8216#section-4.3.4.1.1

    1. LANGUAGE

    This is where things get a bit more complicated.

    enter image description here

    CEA-608 captions do not include the language code.

    For CEA-708 and 608 over 708 this is indicated as part of the ATSC Program and System Information Protocol (PSIP) tables which should be present in the PMT and EIT.

    Caption Service Descriptor

    Caption Service Descriptor (concluded)

    1. INSTREAM-ID

    This can be either CC1, CC2 (field 1), CC3, CC4 (field 2) for CEA-608 - where CC1 and CC2 carry normal and easy-reader captions for the primary language and CC3 and CC4 for the secondary language - or in the form SERVICEn for CEA-708 services.

    These should be advertised in the CSD (see above), if present.

    I don't think FFmpeg extracts these by default so you'll either need to extend it or write an MPEG-TS parser to retrieve the information. There are a few libs for parsing MPEG-TS and for dealing with captions (ex: libcaption by fellow StackOverflow user @szatmary).

    If you just want to extract the captions use FFmpeg or ccextractor

    If you want to do it manually you could use some software like DVBInspector to see the PSI contents:

    DVBInspector CSD