regexgoogle-refine

How do i integrate an If function within the forEach function with GREL?


i am working with Google Refine right now. My goal is to split a single, existing column into two parts. I am using the built in "add column based on...“-function. The column contains a street name and the coressponding house number. For example, I've got this:

Stackoverflow Street 89

Every row is structured in the same way, which means that a streetname can be long or short, can contain one or more words, but it always ends with a (house) number and its always separated by a blank space.

As I am working with Google Refine, i was about to do this by using GREL statements (Google Refine Expression Language). The idea is like that: Split every value divided by a blank space into pieces. Save each split value in a variable. Then check if the specific values contains a numeric value (which is the house number). If it is a numeric value, take that value and forward it to the output. I already got something like this.

forEach(value.split(" "), v, v).join(";")

Here is the question: How do i integrate an IF function within the forEach function? If i would know how to do it, i would probably be able to not only split the values, but also check if its numeric. To check if it is numeric, one can use isNumeric(value).

I appreciate every hint. Thanks, Jörg.


Solution

  • If all the constraints that you describe hold true (it's always the last piece and always exists), I'd just use value.split('')[-1]

    In answer to the direct question that you asked, in case someone else finds this in search, your expression so far is equivalent to:

    value.split(" ").join(";")
    

    ie the forEach isn't doing anything. To integrate the if(), use it as part of the expression which forms the 3rd parameter to forEach

    forEach(value.split(" "), v, if(isNumeric(v),null,v)).join(";")
    

    I'm not sure what output format you're trying to achieve, but this will give you a semi-colon separated list of all non-numeric words in your cell value.