First, I don't know if this is actually possible but what I want to do is repeat a regex pattern. The pattern I'm using is:
sed 's/[^-\t]*\t[^-\t]*\t\([^-\t]*\).*/\1/' films.txt
An input of
250. 7.9 Shutter Island (2010) 110,675
Will return:
Shutter Island (2010)
I'm matching all none tabs, (250.) then tab, then all none tabs (7.9) then tab. Next I backrefrence the film title then matching all remaining chars (110,675).
It works fine, but im learning regex and this looks ugly, the regex [^-\t]*\t is repeated just after itself, is there anyway to repeat this like you can a character like a{2,2}?
I've tried ([^-\t]*\t){2,2}
(and variations) but I'm guessing that is trying to match [^-\t]*\t\t?
Also if there is any way to make my above code shorter and cleaner any help would be greatly appreciated.
I think you might be going about this the wrong way. If you're simply wanting to extract the name of the film, and it's release year, then you could try this regex:
(?:\t)[\w ()]+(?:\t)
As seen in place here:
Note that it matches a tab character at the beginning and end of the actual desired string, but doesn't include them in the matching group.