I am trying to get the captions from a segment from a live feed. I am running the command
ffmpeg -i seg-1077853030-v1-a1.ts
Output
`Input #0, mpegts, from 'seg-109853030-v1-a1.ts': Duration: 00:00:06.01, start: 57867.901133, bitrate: 2649 kb/s Program 1
Stream #0:0[0x100]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(tv, bt709, progressive), 1280x720 [SAR 1:1 DAR 16:9], Closed Captions, 29.97 fps, 29.97 tbr, 90k tbn, 59.94 tbc
Stream #0:1[0x101]: Audio: aac (LC) ([15][0][0][0] / 0x000F), 48000 Hz, stereo, fltp, 98 kb/s
Stream #0:2[0x102]: Data: timed_id3 (ID3 / 0x20334449)`
My question is what command should I run to print out the caption file with the track metadata including label and language.
If your MPEG-TS file is a HLS segment then just parse the HLS master playlist to retrieve the values. If your input is captured from a live broadcast then read on.
GROUP-ID
It's up to you to set this value in the HLS playlist to indicate the rendition's group.
See: https://www.rfc-editor.org/rfc/rfc8216#section-4.3.4.1.1
LANGUAGE
This is where things get a bit more complicated.
CEA-608 captions do not include the language code.
For CEA-708 and 608 over 708 this is indicated as part of the ATSC Program and System Information Protocol (PSIP) tables which should be present in the PMT and EIT.
cc_type
- 0
for 608, 1
for 708line21_field
- when cc_type
is 0
: 0
for field 1 (which includes channels CC1 and CC2) and 1
for field 2 (which includes channels CC3 and CC4)caption_service_number
- when cc_type
is 1
INSTREAM-ID
This can be either CC1
, CC2
(field 1), CC3
, CC4
(field 2) for CEA-608 - where CC1 and CC2 carry normal and easy-reader captions for the primary language and CC3 and CC4 for the secondary language - or in the form SERVICEn
for CEA-708 services.
These should be advertised in the CSD (see above), if present.
I don't think FFmpeg extracts these by default so you'll either need to extend it or write an MPEG-TS parser to retrieve the information. There are a few libs for parsing MPEG-TS and for dealing with captions (ex: libcaption by fellow StackOverflow user @szatmary).
If you just want to extract the captions use FFmpeg or ccextractor
If you want to do it manually you could use some software like DVBInspector to see the PSI contents: