pythonpython-re

Python re, extract path from text


I receive a text like:

`D:\Programming\sit\bin\MyLab.json`

It may contain different kind of quotes or may not contain them. Quotes if present are placed strictly at the beggining and at the end of the text, wrapping the path. But the text definately contains absolute windows path of a file. But this file may also be absent. I am struggling to write algorythm extracting the path.

I have tried regex like:

re.findall(r'[a-zA-Z]:\\((?:[a-zA-Z0-9() ]*\\)*).*', a)

but I receive:

['Programming\\sit\\bin\\']

But I expect to get string with path like:

D:\Programming\sit\bin\MyLab.json

Solution

  • This is just an idea, but if you're sure that the path will always be an absolute Windows path, and that the quotes (if present) will always match, then maybe it isn't necessary to use Regex? Instead just check if the first character is an alphabet (the volume letter), and strip the first and last character if it isn't? Something like:

    import string
    
    
    def normalize_windows_path(a: str) -> str:
        if a[0] in string.ascii_letters:
            return a
        else:
            return a[1:-1]