javaregexgoogle-refine

Parsing out part of string in Google Refine - error message


I am cleaning a dataset using Google Refine. I have one column with dates in the mm/dd/yyyy format. I want to create a new column in which mm/dd/yyyy is replaced by yyyy only.

I have tried

value.replace(/.+(\d\d\d\d)\*/, /$1/)

and what showed up was

Error: replace expects 3 strings, or 1 string, 1 regex, and 1 string

Why does this error show up? Thank you for helping a beginner!


Solution

  • If the values are just dates in that regular format, the easy solution is:

    value.split('/')[2]
    

    if you need to pluck the date out of the middle of a long string or just have a warped desire to mess with regex :-) then you can use

    value.replace(/([1-9]|0[1-9]|[12][0-9]|3[01])\D([1-9]|0[1-9]|1[012])\D(19[0-9][0-9]|20[0-9][0-9])/, "$3")
    

    BTW, there are lots of canned regexes on the web that you can just search for rather than recreating them yourself which is what I did here. I wouldn't have made it anywhere near that complex/specific. You should adjust it to your needs, depending on how strict/liberal you want your matching.