pythonregexpython-regex

Regex - negative lookbehind for any character excluding pure whitespace


I'm trying to write a regex pattern that will fail a match if the preceding pattern contains any character except pure whitespace, for example

--hello (match)
--goodbye (match)
ROW_NUMBER() OVER (ORDER BY DATE) --date (fail)
  --comment with some indentation (match)
    --another comment with some indentation (match)

The closest I've got to is with this pattern I made (?<!.)--.*\n, that gives me this result

--hello (match)
--goodbye (match)
ROW_NUMBER() OVER (ORDER BY DATE) --date (fail)
  --comment with some indentation (fail)
    --another comment with some indentation (fail)

I've tried (?<!\s)--.*\n and (?<=\S)--.*\n but both return no matches at all

EDIT: a regexr.com illustrating the issue more clearly regexr.com/6j0mt


Solution

  • With PyPi regex, you can use

    import regex
    
    text = r"""--hello
    --goodbye
    ROW_NUMBER() OVER (ORDER BY DATE) --date
      --comment with some indentation
        --another comment with some indentation"""
    
    print( regex.findall(r'(?<=^[^\S\r\n]*)--.*', text, regex.M) )
    # => ['--hello', '--goodbye', '--comment with some indentation', '--another comment with some indentation']
    

    See this Python demo online.

    Or, with the default Python re:

    import re
     
    text = r"""--hello
    --goodbye
    ROW_NUMBER() OVER (ORDER BY DATE) --date
      --comment with some indentation
        --another comment with some indentation"""
     
    print( re.findall(r'^[^\S\r\n]*(--.*)', text, re.M) )
    

    See this Python demo.

    Pattern details