pythonregexab-initio

Anyone offer a better solution? Right to left regex. using python


First of all happy Independence Day! for those who apply!

I´m analyzing an Ab Initio graphs, for that, I need to obtain the name of the component, the one that the developer used to describe it´s functionality, which I can extract from the following line.

name ='}}@0|@207000|80000|227000|100000|152000|126000|11654|RFMT: Generate Labels Header|Ab Initio Software|Built-in|1|100|0||6||32769|1|{1|0|}}}'

I tried to use regex to extract the name of the component which is: RFMT: Generate Labels Header.

There comes the problem:

My delimiter is |Ab Initio Software that means, I need to use regex from right to left. is there any way to acomplish that using Python.

The most eficient solution I have came up with is to reverse everything.

name = line[::-1]
name = re.search('erawtfoS oitinI bA\|(.*?)\|', name, re.IGNORECASE).group(1)
name = name[::-1]

All I want is to make it more efficient because is going to be used on hundreds of graphs and most of those files are quite large.


Solution

  • You could just match non-| characters and use lookarounds to make sure it's the element before Ab Initio...:

    re.search(r'(?<=[|])[^|]*(?=[|]Ab Initio Software)', name, re.IGNORECASE).group()
    

    Even without the lookahead, if you just change (.*?) to the more explicit [^|]*, you'd get the right result. But the greedy lookahead solution might be more efficient. Anyway, here it is:

    re.search(r'[|]([^|]*)[|]Ab Initio Software', name, re.IGNORECASE).group(1)