I have a bunch of product data to clean prior to entry into a database that looks like this:
COL A | COL B | COL C... "N" |
---|---|---|
Option 1 | A, B, C, D | Option 1 attribute |
Option 2 | C, D, F | Option 2 attribute |
Option 3 | D, J, Z | Option 3 attribute |
And I'd like for it to look like this with a unique row for every unique product option:
COL A | COL B | COL C... "N" |
---|---|---|
Option 1 | A | Option 1 attribute |
Option 1 | B | Option 1 attribute |
Option 1 | C | Option 1 attribute |
Option 1 | D | Option 1 attribute |
Option 2 | C | Option 2 attribute |
Option 2 | D | Option 2 attribute |
Option 2 | F | Option 2 attribute |
Option 3 | D | Option 3 attribute |
Option 3 | J | Option 3 attribute |
Option 3 | Z | Option 3 attribute |
I understand how I could do this with a python script, but I am already using OpenRefine, and I am hoping not to involve a whole new process to my data flow.
Is there an easy way to do this in OpenRefine? I am having a hard time finding a method or extensions for something like this.
Thanks!
EDIT
@magdmartin How can you fill down blank cells using delineated values from the first cell?
COL A | COL B | COL C... "N" |
---|---|---|
Option 1 | A,B,C,D | Option 1 attribute |
Option 1 | Option 1 attribute | |
Option 1 | Option 1 attribute | |
Option 1 | Option 1 attribute | |
Option 2 | C,D,F | Option 2 attribute |
Option 2 | Option 2 attribute | |
Option 2 | Option 2 attribute | |
Option 3 | D,J,Z | Option 3 attribute |
Option 3 | Option 3 attribute | |
Option 3 | Option 3 attribute |
Turned into
COL A | COL B | COL C... "N" |
---|---|---|
Option 1 | A | Option 1 attribute |
Option 1 | B | Option 1 attribute |
Option 1 | C | Option 1 attribute |
Option 1 | D | Option 1 attribute |
Option 2 | C | Option 2 attribute |
Option 2 | D | Option 2 attribute |
Option 2 | F | Option 2 attribute |
Option 3 | D | Option 3 attribute |
Option 3 | J | Option 3 attribute |
Option 3 | Z | Option 3 attribute |
Thanks!
I recorded a video here walking through each options describe below here: https://youtu.be/3194zXoJtqI
For this project, you will need to use two OpenRefine functions
COL B
to create one new line for each value separated by commaIf you have a lot of columns you can use the All > Transform
to speed up the process with the following expression row.record.cells[columnName].value[0]
. The trick here is to fill down Col A
last so we can keep the record mode when filling down other column (see screenshot below)