I have a second tsv file (Project2) with several columns out of which I want to import only a couple of selected ones into my existing project in OpenRefine. In some previous releases of OpenRefine, there was the option of "import from another project" but in recent releases this has apparently been decommissioned. I need a GREL or Python script to be able to do this operation for as many columns as I desire.
Project1
Column A | Column B |
---|---|
id_1 | value1 |
id_2 | value2 |
Project2
Column C | Column D | Column E | Column F |
---|---|---|---|
id_1 | value_x | value_z | value_u |
id_2 | value_y | value_v | value_t |
Expected merged project
Column A | Column B | Column E | Column F |
---|---|---|---|
id_1 | value1 | value_z | value_u |
id_2 | value2 | value_v | value_t |
I found a solution here but it works for importing only one column.
There are currently several ways of merging data from different project in OpenRefine.
The function in OpenRefine is to access data in other projects is called cross.
So if you only want to "import" a small number of columns from another project you can use it like this:
Column E
and as GREL expression cell.cross("Project2", "Column C").cells["Column E"][0].value
.If you want to import a bigger number of columns you somehow have to make the names of these columns known to OpenRefine.
Merge Columns
and as GREL expression "Column E||Column F"
.||
.row.cells["Column A"][0].value.cross("Project2", "Column C").cells[row.cells["Merge Columns"].value][0].value
.Starting with OpenRefine 3.8 you can access the column names of another project (see GitHub Issues 5903 and 5633. With this we can import all the columns of the other project and then use Facets and Filters to remove the data we do not want to import.
Keys
and for "Value column" use the value Values
.with("||", sep,
with(row.cells["Column A"], id,
if(isBlank(id), value,
value + sep + with(id.cross("Project 2", "Column C")[0], mergeRow,
mergeRow.columnNames
).join(sep)
)
)
)
with("||", sep,
with(row.cells["Column A"], id,
if(isBlank(id), value,
value + sep + with(id.cross("Project 2", "Column C")[0], mergeRow,
forEach(mergeRow.columnNames, c,
mergeRow.cells[c].value
)
).join(sep)
)
)
)
More recipes on how to combine datasets can be found in the OpenRefine Wiki.