openrefine

Split multi-valued cells paired with other multi-valued cells in other rows


I'm working on some data in openrefine where many cells have multiple values that pair to a single value each in another multi-value cell in a different row. I want to turn that into one row with each paired data so turn something like this

Name Account Number
John; Jennifer; JT 0239; 2191; 1788
JT 1788
Janet; John 0591; 0239

into this:

Name Account Number
John 0239
Jennifer 2191
JT 1788
JT 1788
Janet 0591
John 0239

Is this possible to do? If so any suggestions on how to do so?

I tried this but it didn't get me there.

EDIT: Here's what happens when I split each row using Edit Multi-Valued Cells and the semicolon separator. I'm unsure how to get the names next to the account numbers as fill down wouldn't work.

Name Account Number
John 0239
2191
1788
Jennifer
JT
JT 1788
Janet 0591
0239
John

Solution

  • The concept you are searching for is cell splitting for multi-valued cells. For details see the documentation about cell editing.

    If you do not have a record structure yet I suggest you apply the cell splitting command from right to left.

    1. Split the cells in the column "Account Number" on the separator ";" via "Edit cells" => "Split multi-valued cells".
    2. Split the cells in the column "Name" on the separator ";" via "Edit cells" => "Split multi-valued cells".
    3. Remove leading and trailing white space in both columns via "Edit cells" => "Common transforms" => "Trim leading and trailing white space" (see documentation about cell editing for details).

    You can forgo step 3 by activating the regular expression checkbox in the cell splitting dialog and use a regular expression like "\s*;\s*" as separator.