Looking at this previously posted question and answer
Find and return blocks of lines containing a string
Where user @potong provided an elegant solution in a fairly simple command.
I can do some of what I'm trying on the output from a mediainfo command run against the media data file to produce a text output of the files streams.
General
Unique ID :
(0xDAC55CA81AA8F777EB9DE67AC6)
Complete name : Some Media File.mkv
Format : Matroska
Format version : Version 4
File size : 1.44 GiB
Duration : 46 min 22 s
Overall bit rate : 4 761 kb/s
Frame rate : 23.976 FPS
Encoded date : 2023-08-08 06:39:11 UTC
Writing application : mkvmerge v76.0 ('Celebration') 64-bit
Writing library : libebml v1.4.4 + libmatroska v1.7.1
Video
ID : 1
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main 10@L4@Main
Codec ID : V_MPEGH/ISO/HEVC
Duration : 46 min 22 s
Bit rate : 4 501 kb/s
Width : 1 920 pixels
Height : 1 080 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 23.976 (24000/1001) FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 10 bits
Bits/(Pixel*Frame) : 0.091
Stream size : 1.36 GiB (95%)
Writing library : x265 3.5+96-d844ab494:[Windows][GCC 12.2.0][64 bit] 10bit
Encoding settings : cpuid=1111039 / frame-threads=3 / numa-pools=8 / wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=1920x1080 / interlace=0 / total-frames=0 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=4 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-eob / no-eos / no-hrd / info / hash=0 / temporal-layers=0 / open-gop / min-keyint=23 / keyint=250 / gop-lookahead=0 / bframes=8 / b-adapt=2 / b-pyramid / bframe-bias=0 / rc-lookahead=25 / lookahead-slices=4 / scenecut=40 / no-hist-scenecut / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=2 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=3 / limit-refs=3 / limit-modes / me=3 / subme=3 / merange=57 / temporal-mvp / no-frame-dup / no-hme / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / no-sao / no-sao-non-deblock / rd=4 / selective-sao=0 / no-early-skip / rskip / no-fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=1.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=abr / bitrate=4500 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=2 / cplxblur=20.0 / qblur=0.5 / ipratio=1.40 / pbratio=1.30 / aq-mode=3 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=1 / overscan=0 / videoformat=5 / range=0 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / cll=0,0 / min-luma=0 / max-luma=1023 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / aq-motion / no-hdr10 / no-hdr10-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=0 / analysis-save-reuse-level=0 / analysis-load-reuse-level=0 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=1 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / no-svt / no-field / qp-adaptation-range=1.00 / scenecut-aware-qp=0conformance-window-offsets / right=0 / bottom=0 / decoder-max-rate=0 / no-vbv-live-multi-pass / no-mcstf / no-sbrc
Default : Yes
Forced : Yes
Color range : Limited
Audio
ID : 2
Format : E-AC-3
Format/Info : Enhanced AC-3
Commercial name : Dolby Digital Plus
Codec ID : A_EAC3
Duration : 46 min 22 s
Bit rate mode : Constant
Bit rate : 256 kb/s
Channel(s) : 6 channels
Channel layout : L R C LFE Ls Rs
Sampling rate : 48.0 kHz
Frame rate : 31.250 FPS (1536 SPF)
Compression mode : Lossy
Stream size : 79.3 MiB (5%)
Title : English
Language : English
Service kind : Complete Main
Default : Yes
Forced : No
Dialog Normalization : -27 dB
compr : -0.28 dB
mixlevel : 105 dB
roomtyp : Small
ltrtcmixlev : 3.0 dB
ltrtsurmixlev : -3.0 dB
lorocmixlev : 3.0 dB
lorosurmixlev : -3.0 dB
dialnorm_Average : -27 dB
dialnorm_Minimum : -27 dB
dialnorm_Maximum : -27 dB
Text #1
ID : 3
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 77 b/s
Frame rate : 0.255 FPS
Count of elements : 649
Stream size : 24.1 KiB (0%)
Language : French
Default : No
Forced : No
Text #2
ID : 4
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 76 b/s
Frame rate : 0.259 FPS
Count of elements : 659
Stream size : 23.8 KiB (0%)
Language : German
Default : No
Forced : No
Text #3
ID : 5
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 26 min 14 s
Bit rate : 0 b/s
Frame rate : 0.003 FPS
Count of elements : 5
Stream size : 91.0 Bytes (0%)
Language : Italian
Default : No
Forced : No
Text #4
ID : 6
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 75 b/s
Frame rate : 0.260 FPS
Count of elements : 661
Stream size : 23.3 KiB (0%)
Language : Italian
Default : No
Forced : No
Text #5
ID : 7
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 21 s
Bit rate : 56 b/s
Frame rate : 0.253 FPS
Count of elements : 643
Stream size : 17.5 KiB (0%)
Language : Japanese
Default : No
Forced : No
Text #6
ID : 8
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 84 b/s
Frame rate : 0.260 FPS
Count of elements : 660
Stream size : 26.2 KiB (0%)
Language : Korean
Default : No
Forced : No
Text #7
ID : 9
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 71 b/s
Frame rate : 0.258 FPS
Count of elements : 656
Stream size : 22.0 KiB (0%)
Language : Norwegian
Default : No
Forced : No
Text #8
ID : 10
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 30 min 43 s
Bit rate : 0 b/s
Frame rate : 0.003 FPS
Count of elements : 6
Stream size : 75.0 Bytes (0%)
Language : Polish
Default : No
Forced : No
Text #9
ID : 11
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 70 b/s
Frame rate : 0.261 FPS
Count of elements : 663
Stream size : 21.9 KiB (0%)
Language : Polish
Default : No
Forced : No
Text #10
ID : 12
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 74 b/s
Frame rate : 0.261 FPS
Count of elements : 663
Stream size : 23.1 KiB (0%)
Language : Portuguese
Default : No
Forced : No
Text #11
ID : 13
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 79 b/s
Frame rate : 0.260 FPS
Count of elements : 660
Stream size : 24.7 KiB (0%)
Language : Portuguese
Default : No
Forced : No
Text #12
ID : 14
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 77 b/s
Frame rate : 0.260 FPS
Count of elements : 662
Stream size : 24.2 KiB (0%)
Language : Spanish
Default : No
Forced : No
Text #13
ID : 15
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 30 min 41 s
Bit rate : 0 b/s
Frame rate : 0.003 FPS
Count of elements : 6
Stream size : 77.0 Bytes (0%)
Language : Spanish
Default : No
Forced : No
Text #14
ID : 16
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 69 b/s
Frame rate : 0.254 FPS
Count of elements : 645
Stream size : 21.6 KiB (0%)
Language : Spanish
Default : No
Forced : No
Text #15
ID : 17
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 74 b/s
Frame rate : 0.258 FPS
Count of elements : 657
Stream size : 23.1 KiB (0%)
Language : Swedish
Default : No
Forced : No
Text #16
ID : 18
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 21 s
Bit rate : 84 b/s
Frame rate : 0.311 FPS
Count of elements : 791
Stream size : 26.3 KiB (0%)
Language : English
Default : No
Forced : No
Text #17
ID : 19
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 75 b/s
Frame rate : 0.258 FPS
Count of elements : 655
Stream size : 23.4 KiB (0%)
Language : English
Default : No
Forced : No
Text #18
ID : 20
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 43 min 9 s
Bit rate : 69 b/s
Frame rate : 0.285 FPS
Count of elements : 739
Stream size : 21.9 KiB (0%)
Language : Chinese
Default : No
Forced : No
Text #19
ID : 21
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 62 b/s
Frame rate : 0.261 FPS
Count of elements : 663
Stream size : 19.4 KiB (0%)
Language : Chinese
Default : No
Forced : No
Text #20
ID : 22
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 73 b/s
Frame rate : 0.259 FPS
Count of elements : 659
Stream size : 22.9 KiB (0%)
Language : Danish
Default : No
Forced : No
Text #21
ID : 23
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 64 b/s
Frame rate : 0.249 FPS
Count of elements : 634
Stream size : 19.9 KiB (0%)
Language : Dutch
Default : No
Forced : No
Text #22
ID : 24
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 74 b/s
Frame rate : 0.259 FPS
Count of elements : 658
Stream size : 23.2 KiB (0%)
Language : Finnish
Default : No
Forced : No
Text #23
ID : 25
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 26 min 14 s
Bit rate : 0 b/s
Frame rate : 0.003 FPS
Count of elements : 5
Stream size : 70.0 Bytes (0%)
Language : French
Default : No
Forced : No
So using this command
sed -n '/^Text #/!{H;$!d};x;/English/p' 'Some Media File.mkv.MDINFO'
I get this output
General
Unique ID :
(0xDAC55CA81AA8F777EB9DE67AC6)
Complete name : Some Media File.mkv
Format : Matroska
Format version : Version 4
File size : 1.44 GiB
Duration : 46 min 22 s
Overall bit rate : 4 761 kb/s
Frame rate : 23.976 FPS
Encoded date : 2023-08-08 06:39:11 UTC
Writing application : mkvmerge v76.0 ('Celebration') 64-bit
Writing library : libebml v1.4.4 + libmatroska v1.7.1
Video
ID : 1
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main 10@L4@Main
Codec ID : V_MPEGH/ISO/HEVC
Duration : 46 min 22 s
Bit rate : 4 501 kb/s
Width : 1 920 pixels
Height : 1 080 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 23.976 (24000/1001) FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 10 bits
Bits/(Pixel*Frame) : 0.091
Stream size : 1.36 GiB (95%)
Writing library : x265 3.5+96-d844ab494:[Windows][GCC 12.2.0][64 bit] 10bit
Encoding settings : cpuid=1111039 / frame-threads=3 / numa-pools=8 / wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=1920x1080 / interlace=0 / total-frames=0 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=4 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-eob / no-eos / no-hrd / info / hash=0 / temporal-layers=0 / open-gop / min-keyint=23 / keyint=250 / gop-lookahead=0 / bframes=8 / b-adapt=2 / b-pyramid / bframe-bias=0 / rc-lookahead=25 / lookahead-slices=4 / scenecut=40 / no-hist-scenecut / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=2 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=3 / limit-refs=3 / limit-modes / me=3 / subme=3 / merange=57 / temporal-mvp / no-frame-dup / no-hme / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / no-sao / no-sao-non-deblock / rd=4 / selective-sao=0 / no-early-skip / rskip / no-fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=1.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=abr / bitrate=4500 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=2 / cplxblur=20.0 / qblur=0.5 / ipratio=1.40 / pbratio=1.30 / aq-mode=3 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=1 / overscan=0 / videoformat=5 / range=0 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / cll=0,0 / min-luma=0 / max-luma=1023 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / aq-motion / no-hdr10 / no-hdr10-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=0 / analysis-save-reuse-level=0 / analysis-load-reuse-level=0 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=1 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / no-svt / no-field / qp-adaptation-range=1.00 / scenecut-aware-qp=0conformance-window-offsets / right=0 / bottom=0 / decoder-max-rate=0 / no-vbv-live-multi-pass / no-mcstf / no-sbrc
Default : Yes
Forced : Yes
Color range : Limited
Audio
ID : 2
Format : E-AC-3
Format/Info : Enhanced AC-3
Commercial name : Dolby Digital Plus
Codec ID : A_EAC3
Duration : 46 min 22 s
Bit rate mode : Constant
Bit rate : 256 kb/s
Channel(s) : 6 channels
Channel layout : L R C LFE Ls Rs
Sampling rate : 48.0 kHz
Frame rate : 31.250 FPS (1536 SPF)
Compression mode : Lossy
Stream size : 79.3 MiB (5%)
Title : English
Language : English
Service kind : Complete Main
Default : Yes
Forced : No
Dialog Normalization : -27 dB
compr : -0.28 dB
mixlevel : 105 dB
roomtyp : Small
ltrtcmixlev : 3.0 dB
ltrtsurmixlev : -3.0 dB
lorocmixlev : 3.0 dB
lorosurmixlev : -3.0 dB
dialnorm_Average : -27 dB
dialnorm_Minimum : -27 dB
dialnorm_Maximum : -27 dB
Text #16
ID : 18
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 21 s
Bit rate : 84 b/s
Frame rate : 0.311 FPS
Count of elements : 791
Stream size : 26.3 KiB (0%)
Language : English
Default : No
Forced : No
Text #17
ID : 19
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Duration : 46 min 22 s
Bit rate : 75 b/s
Frame rate : 0.258 FPS
Count of elements : 655
Stream size : 23.4 KiB (0%)
Language : English
Default : No
Forced : No
Not being a sed expert I'm trying to understand why the General, Video and Audio blocks are being printed at all? I thought the /^Text #/ in the sed command would exclude any data block not beginning with 'Text #'?
The output of only the text blocks with English language commentary is perfect, as what I want to do is process about fifty of these files (and possibly much more later), where the English subtitle text is the only stream I'm interested in, but it's not consistent which stream the English text will be in.
Thus the purpose of my script is to identify the stream number specifically for the English text stream and then use that information in a following script via MKVToolNix to strip all the non English Text streams, leaving only the English ones in place. I'll probably also dump the English streams to a .srt file to keep separately as a backup.
This might work for you (GNU sed):
sed -n '/^Text #/!{H;$!d};x;/^General$/Md;/English/p' file
The instruction /^General$/Md
will throw away the preamble i.e. any text before the first Text #
line.
Or you may prefer:
sed -n '/^Text #/!{H;$!d};x;/^Text #/M!d;/English/p' file
or:
sed -n '/^Text #/!{H;$!d};x;/^\nGeneral\n/d;/English/p' file
N.B. The preamble was only printed in the OP's solution because it included the text English
.
As a result of a mistake (pointed out by Ed Morton) I had a thought of shortening the second solution by removing the 2nd regex entirely and expected it to resort to the default of copying the 1st regex. However this threw an error cannot specify modifiers on empty regexp
. On the off chance the flag was tied to the original regex I moved it to the first and the error resolved itself. As a pristine line read into the pattern space cannot be multiple, this in no way affects the solution.
This may be a feature
of GNU sed rather than an intended effect so perhaps should not be used in production.
sed -n '/^Text #/M!{H;$!d};x;//!d;/English/p' file