I have a csv that contains plain text entries in cells as well as JSON arrays. I'm new to OpenRefine and GREL and having troubles finding an expression to use to clean this csv. I want to get just the value of the "name" key in the JSON arrays.
Example cells:
[{'name': 'Pixar Animation Studios', 'id': 3}]
[{'name': 'TriStar Pictures', 'id': 559}, {'name': 'Teitler Film', 'id': 2550}, {'name': 'Interscope Communications', 'id': 10201}]
[{'name': 'Twentieth Century Fox Film Corporation', 'id': 306}]
[{'iso_3166_1': 'US', 'name': 'United States of America'}]
Expected return values:
Pixar Animation Studios
TriStar Pictures, Teitler Film, Interscope Communications
Twentieth Century Fox Film Corporation
United States of America
If your data looks like this...
...Tom Morris' formula won't work. It seems that Open refine doesn't like single quotes in a Json. Also, since you sometimes have several "names", you'll have to retrieve them all with a forEach()
loop.
The formula is as follow:
forEach(value.replace("'", '"').parseJson(), v, v.name).join(',')
It means: replace the ' by ", parse the json and then, for each element in the array, put it in a variable v and get its value "name". Finally, join the resulting array with a comma.
Final result: