openrefinegrel

Parsing multiple values with Google Refine


I've a CSV column with content like this (just an example):

[{"qual"=>"05-Admmin "name"=>"CLARK C COHO"}, {"qual"=>"20-Soc Con", "name"=>"ALPHA S A"}, {"qual"=>"20-Soc Con", "name"=>"JACK SA"}

I would like to extract automatically the values from "name" field and separate it by comma, resulting in something like this: CLARCK C COHO, ALPHA S A, JACK SA and so on.

I know that I can get a specific value with this code:

value.parseJson()[0].name

I've been reading the documentation but i'm not figuring out how to loop this between all fields.

Any tips?

EDIT: Here is another example of the column. The content really look like this:

[{"qual"=>"49-SocAdm", "name"=>"ALVARO R L"}, {"qual"=>"49-SocAdm", "name"=>"GABRIEL G L"}]

Solution

  • The data in your CSV is not in JSON format. I do not know what it is. A kind of key-value format, but I do not know which one. In addition, it sometimes lacks a comma or a bracket. We could try to transform it into a valid JSOn, but it will be easier to extract information using regular expressions. Here is an example with Python / Jython.

    import re
    
    pattern = re.compile(r'"name"=>"(.+?)"', re.M)
    
    return ", ".join(pattern.findall(value))
    

    screenshot