I'm developing in Java and I have a long string of text which contains all the information I need about a particular DVD. (It's the scan output from HandBrakeCLI). I need to build a regex which will capture each bit of information I need but ignore some special cases. My program consists of DVD, Title, Chapter, and Language objects. It's structured like so: DVD has Title(s) Title has Chapter(s) and Language(s) I need to regex out the following info from the output: Title Number - Language Name and Audio Track Number - Chapter Number
One special case that's giving me particular trouble is that some titles start scanning but then the output says they're ignored because it's too short and starts with the next title. I don't know how to write a regex that would ignore any matches which contain that. I've been having a very hard time figuring it out!
scan: scanning title (\d+)?.{0,500}(ignoring title)
This will capture all the titles which need to be ignored, but I think I need one long regex which will capture all the information I need and ignore the special cases. If I could somehow get it to bind each valid title scan in one group, that'd be great! Thanks a lot for the help!
Here is a sample of the output:
[11:25:53] scan: DVD has 9 title(s)
[11:25:53] scan: scanning title 1
[11:25:53] scan: opening IFO for VTS 1
[11:25:53] scan: duration is 00:00:00 (76 ms)
[11:25:53] scan: ignoring title (too short)
[11:25:53] scan: scanning title 2
[11:25:53] scan: opening IFO for VTS 2
[11:25:53] scan: duration is 01:59:27 (7167153 ms)
[11:25:53] pgc_id: 1, pgn: 1: pgc: 0x1bad980
[11:25:53] scan: vts=2, ttn=1, cells=0->17, blocks=4->3, 1906832 blocks
[11:25:53] scan: checking audio 1
[11:25:53] scan: id=80bd, lang=English (AC3), 3cc=eng ext=0
[11:25:53] scan: checking audio 2
[11:25:53] scan: id=81bd, lang=Deutsch (AC3), 3cc=deu ext=0
[11:25:53] scan: checking audio 3
[11:25:53] scan: id=82bd, lang=English (AC3), 3cc=eng ext=0
[11:25:53] scan: checking audio 4
[11:25:53] scan: id=83bd, lang=Espanol (AC3), 3cc=spa ext=0
[11:25:53] scan: checking audio 5
[11:25:53] scan: id=84bd, lang=Francais (AC3), 3cc=fra ext=0
[11:25:53] scan: checking audio 6
[11:25:53] scan: id=85bd, lang=Italiano (AC3), 3cc=ita ext=0
[11:25:53] scan: checking audio 7
[11:25:53] scan: id=86bd, lang=Portugues (AC3), 3cc=por ext=0
[11:25:53] scan: checking audio 8
[11:25:53] scan: id=87bd, lang=Samoan (AC3), 3cc=smo ext=0
[11:25:53] scan: checking subtitle 1
[11:25:53] scan: id=20bd, lang=English, 3cc=eng
[11:25:53] scan: title 2 has 18 chapters
[11:25:53] scan: chap 1 c=0->0, b=4->51422 (51419), 127306 ms
[11:25:53] scan: chap 2 c=1->1, b=51423->79617 (28195), 100277 ms
[11:25:53] scan: chap 3 c=2->2, b=79618->170050 (90433), 233291 ms
[11:25:53] scan: chap 4 c=3->3, b=170051->192087 (22037), 85367 ms
[11:25:53] scan: chap 5 c=4->4, b=192088->327371 (135284), 568451 ms
[11:25:53] scan: chap 6 c=5->5, b=327372->431726 (104355), 283191 ms
[11:25:53] scan: chap 7 c=6->6, b=431727->441166 (9440), 40203 ms
[11:25:53] scan: chap 8 c=7->7, b=441167->675145 (233979), 977815 ms
[11:25:53] scan: chap 9 c=8->8, b=675146->870812 (195667), 778680 ms
[11:25:53] scan: chap 10 c=9->9, b=870813->959026 (88214), 218223 ms
[11:25:53] scan: chap 11 c=10->10, b=959027->1134726 (175700), 748540 ms
[11:25:53] scan: chap 12 c=11->11, b=1134727->1375583 (240857), 1013772 ms
[11:25:53] scan: chap 13 c=12->12, b=1375584->1452670 (77087), 204138 ms
[11:25:53] scan: chap 14 c=13->13, b=1452671->1461940 (9270), 41303 ms
[11:25:53] scan: chap 15 c=14->14, b=1461941->1698075 (236135), 1069800 ms
[11:25:53] scan: chap 16 c=15->15, b=1698076->1826069 (127994), 367324 ms
[11:25:53] scan: chap 17 c=16->16, b=1826070->1906831 (80762), 309385 ms
[11:25:53] scan: chap 18 c=17->17, b=0->3 (4), 76 ms
[11:25:53] scan: aspect = 0
[11:25:53] scan: scanning title 3
Title 3 is similar to Title 2, and Title 5 is similar to Title 1
Alright, so this isn't exactly the right solution, but I found a work around which worked great. I decided instead to create a unique string for each individual Title. It's really simple. Once the program gets to the first line that matches: scan: scanning title (\d+?)
then I created a new title object and start a new string builder for it. Then I regexed each individual string. If it matched ignore
, then I would just ignore it, thus handling that exception. From there, it was easy to regex out the information I needed per title. The program works now and I'm happy to say that when you add up all the time I spent on it and compare it to the time I would have spent queuing up them individually using the GUI, I saved about... er... 5 hours... So it wasn't a huge savings, but it was a million times better than clicking a million times. Haha, anyway, thanks for anyone who helped on this project (including other posts about the project).