In our problem, rows (index) and columns belong to the same category of objects. We want to enlarge a Pandas DataFrame, adding rows and columns filled with NaNs or predefined values, so that both the index and column sets are now the union of the original index and column sets. E.g. transform
A | C | |
---|---|---|
B | 0 | 1 |
C | 1 | 1 |
into
A | B | C | |
---|---|---|---|
A | NaN | NaN | NaN |
B | 0 | NaN | 1 |
C | 1 | NaN | 1 |
Practical example - constructing an adjacency matrix of a directed graph, with vertex labels in rows and columns. At some stage, some of the columns and rows with no directed edge from them/to them are to be filled.
The core issue is how to do it efficiently. Being such a basic issue it feels like it should be implemented as a standard method. Is there one?
The simple solution is to iterate over all the entries in index and columns that are not in the other set and add columns/rows (respectively) to the dataframe.
The problem with simple reindex etc. is that we're simultaneously enlarging the dataframe, and also the missing values can be in between other columns.
I would get the index union
and reindex
:
idx = df.index.union(df.columns)
out = df.reindex(index=idx, columns=idx)
Output:
A B C
A NaN NaN NaN
B 0.0 NaN 1.0
C 1.0 NaN 1.0