google-sheetsgoogle-analyticsdata-analysispandas-explodefirst-normal-form

How do you deal with multiple values in a single cell separated by comma?


Student Fruits
Foo Apple,Banana
Bar Orange
Baz Lemon,Orange

How do I quantify/analyze the column Fruits with multiple values in a single cell separated by comma?

I created multiple helper columns, then split the values like so:

Student Fruits fruit1 fruit2
Foo Apple,Banana Apple Banana
Bar Orange Orange
Baz Lemon,Orange Lemon Orange

This leads to both horizontally and vertically entered data. So if I need to, say, find out the fruits for 'Foo' in a conventional way, I'd have to restructure this so that each fruit is in its own row to make it spreadsheet readable. It also doesn't translate well if I need to create pivot tables.

Is there a better approach for this kind of task?


Solution

  • The data should preferably be normalized to the first-normal-form:

    Student Fruits
    Foo Apple
    Foo Banana
    Bar Orange
    Baz Lemon
    Baz Orange

    You can try formulas like the one below to make the transformation.

    =QUERY(
      WRAPROWS(
       FLATTEN(
        MAP(
         A1:A4,
         B1:B4,
         LAMBDA(a,b,
          TOROW(
           REDUCE(,SPLIT(b,","),
            LAMBDA(ac,cur,
              VSTACK(ac,{a,cur}))))))),2),
     "where Col1<>''")