Student | Fruits |
---|---|
Foo | Apple,Banana |
Bar | Orange |
Baz | Lemon,Orange |
How do I quantify/analyze the column Fruits with multiple values in a single cell separated by comma?
I created multiple helper columns, then split the values like so:
Student | Fruits | fruit1 | fruit2 |
---|---|---|---|
Foo | Apple,Banana | Apple | Banana |
Bar | Orange | Orange | |
Baz | Lemon,Orange | Lemon | Orange |
This leads to both horizontally and vertically entered data. So if I need to, say, find out the fruits for 'Foo' in a conventional way, I'd have to restructure this so that each fruit is in its own row to make it spreadsheet readable. It also doesn't translate well if I need to create pivot tables.
Is there a better approach for this kind of task?
The data should preferably be normalized to the first-normal-form:
Student | Fruits |
---|---|
Foo | Apple |
Foo | Banana |
Bar | Orange |
Baz | Lemon |
Baz | Orange |
You can try formulas like the one below to make the transformation.
=QUERY(
WRAPROWS(
FLATTEN(
MAP(
A1:A4,
B1:B4,
LAMBDA(a,b,
TOROW(
REDUCE(,SPLIT(b,","),
LAMBDA(ac,cur,
VSTACK(ac,{a,cur}))))))),2),
"where Col1<>''")