First of all happy Independence Day! for those who apply!
I´m analyzing an Ab Initio graphs, for that, I need to obtain the name of the component, the one that the developer used to describe it´s functionality, which I can extract from the following line.
name ='}}@0|@207000|80000|227000|100000|152000|126000|11654|RFMT: Generate Labels Header|Ab Initio Software|Built-in|1|100|0||6||32769|1|{1|0|}}}'
I tried to use regex to extract the name of the component which is: RFMT: Generate Labels Header.
There comes the problem:
My delimiter is |Ab Initio Software that means, I need to use regex from right to left. is there any way to acomplish that using Python.
The most eficient solution I have came up with is to reverse everything.
name = line[::-1]
name = re.search('erawtfoS oitinI bA\|(.*?)\|', name, re.IGNORECASE).group(1)
name = name[::-1]
All I want is to make it more efficient because is going to be used on hundreds of graphs and most of those files are quite large.
You could just match non-|
characters and use lookarounds to make sure it's the element before Ab Initio...
:
re.search(r'(?<=[|])[^|]*(?=[|]Ab Initio Software)', name, re.IGNORECASE).group()
Even without the lookahead, if you just change (.*?)
to the more explicit [^|]*
, you'd get the right result. But the greedy lookahead solution might be more efficient. Anyway, here it is:
re.search(r'[|]([^|]*)[|]Ab Initio Software', name, re.IGNORECASE).group(1)