regexdata-cleaningopenrefinegrel

OpenRefine: How to delete content of cells if it matches specific string pattern?


So I'm doing my first project in OpenRefine and I don't fully get the GREL thing and how to transform my proeject with it yet...

Problem is: I have a column in which I want to delete ONLY the contents of the cells that match a specific pattern and ONLY if they contain nothing else.

My column looks like this: (image not allowed)

LF 2(1927)40
 
LF 2(1927)42
 
"Wirtin" LF 2(1927)44
 
Lottchen LF 3(1928)3
 
LF 3(1928)7
 
"Mit schönem Gruß Arthur Powalla aus Hamburg" LF 3(1928)9
 
LF 3(1928)14 DF 3(1927)1

I want to delete the content of all the cells that contain ONLY a string pattern like LF 2(1927)42. If they contain more than that, I want to keep everything and not delete that pattern.

In the Transform... menue of that column I tried to use the following GREL command:

if(value==(LF \d\(\d*\)\d+),null,value)

I used the regular expression included in this before for a different operation, and it did what it was supposed to do. So I assume the mistake lies elsewhere. The error it sends is this:

Parsing error at offset 14: Missing )

Thanks so much for helping me on this!!


Solution

  • Pattern matching on assignment or comparison expressions is a feature that is not supported by GREL.

    The expression you are looking for is

    if(isNull(value.match(/LF \d\(\d*\)\d+/)), value, null)
    

    Note that I am telling GREL to use the match function on the content of the cell (value) and try to apply the regular expression, enclosed in /.

    We then differentiate between null (pattern does not apply) and an empty array (the pattern does apply, but the pattern does not have a sub pattern defined) via isNull.

    Alternatively you could also use a text filter with regular expressions and then perform a transformation on the filtered data. This is more intuitive when you are not that familiar with GREL.