How to explode a pandas dataframe that has nulls in some rows, but populated in others

So I have many dataframes coming in that need to be exploded. they look something like this:

df = pd.DataFrame({'A': [1, [11,22], [111,222]],
                   'B': [2, [33,44], float('nan')],
                   'C': [3, [55,66], [333,444]],
                   'D': [4, [77,88], float('nan')]
                  })

+-----------+---------+-----------+---------+
| A         | B       | C         | D       |
+-----------+---------+-----------+---------+
| 1         | 2       | 3         | 4       |
+-----------+---------+-----------+---------+
| [11,22]   | [33,44] | [55,66]   | [77,88] |
+-----------+---------+-----------+---------+
| [111,222] | NaN     | [333,444] | NaN     |
+-----------+---------+-----------+---------+

Typically if a column couldn't be exploded I'd just remove it from the column list like so:

colList = df.columns.values.tolist()
colList.remove("B")
colList.remove("D")
df = df.explode(colList)

But that would leave me with a dataframe that looks like:

+-----+---------+-----+---------+
| A   | B       | C   | D       |
+-----+---------+-----+---------+
| 1   | 2       | 3   | 4       |
+-----+---------+-----+---------+
| 11  | [33,44] | 55  | [77,88] |
+-----+---------+-----+---------+
| 22  | [33,44] | 66  | [77,88] |
+-----+---------+-----+---------+
| 111 | NaN     | 333 | NaN     |
+-----+---------+-----+---------+
| 222 | NaN     | 444 | NaN     |
+-----+---------+-----+---------+

I still need to explode those columns (B and D in example), but if I do, it'll throw an error due to the nulls. How can I successfully explode dataframes with this sort of problem?

Solution

One option could be to explode each column separately and deduplicate the index before concat:

def explode_dedup(s):
    s = s.explode()
    return s.set_axis(
        pd.MultiIndex.from_arrays([s.index, s.groupby(level=0).cumcount()])
    )


out = pd.concat({c: explode_dedup(df[c]) for c in df}, axis=1)

Output:

       A    B    C    D
0 0    1    2    3    4
1 0   11   33   55   77
  1   22   44   66   88
2 0  111  NaN  333  NaN
  1  222  NaN  444  NaN

To explode a subset of the columns:

cols = ['C', 'D']
others = df.columns.difference(cols)

out = (df[others]
       .join(pd.concat({c: explode_dedup(df[c])
                        for c in cols}, axis=1)
               .droplevel(-1)
            )[df.columns]
      )

Output:

            A         B    C    D
0           1         2    3    4
1    [11, 22]  [33, 44]   55   77
1    [11, 22]  [33, 44]   66   88
2  [111, 222]       NaN  333  NaN
2  [111, 222]       NaN  444  NaN

Reproducible input:

df = pd.DataFrame({'A': [1, [11,22], [111,222]],
                   'B': [2, [33,44], float('nan')],
                   'C': [3, [55,66], [333,444]],
                   'D': [4, [77,88], float('nan')]
                  })