regexopenrefine

OpenRefine: Inverting a regular expression


I've seen some questions about inverting regular expressions, but I couldn't apply the solutions to OpenRefine and get it to work.

For instance, suppose I have a zip code field where most of the entries have the form

^\d{5}-\d{4}$

I want to filter OUT all those entries to see what is left over using GREL regex. How do I create a regular expression that finds all strings without the above form in Openrefine?


Solution

  • You can use negative lookarounds as a way to match something without a certain substring. In your case it would match everything without 5 digits followed by a dash followed by 4 digits.

    ^((?!\d{5}-\d{4}).)*$

    Another work-around to match everything without a certain substring is to simply replace that substring with "", and then grab all the entries.