pythonpandasdataframe

Best way to turn every cell in a dataframe into its own row in a new dataframe?


Suppose I have a dataframe Old with columns A, B, and C. I want a new dataframe New where two columns D and E. For each cell in Old, I want a corresponding row in the D column in New where the value in E is the name of the column the cell was in.

I know that straight up iterating over a dataframe is bad, but that's how I did it. Here, I only cared about some column names in the Old dataframe, so if the cell wasn't under a column I cared about, I just assigned it the value other. But the principle is the same.

for column in df.columns:
    for entry in df[column]:
        entries.append(entry)
        labels.append(column_labels.get(column, "other"))  # Assign label based on column

My question is what are some better ways to do this? Running this will become exceedingly slow as the dataset grows.


Solution

  • You must be looking for stack():

        df = pd.DataFrame(np.arange(12).reshape((4,3)), columns=list("ABC"))
        
           A   B   C
        0  0   1   2
        1  3   4   5
        2  6   7   8
        3  9  10  11
        
        res = (df.stack()
                 .reset_index(level=1)
                 .sort_values(by="level_1")
                 .reset_index(drop=True)
                 .rename(columns={"level_1":"labels", 0:"entries"})
        )
        
           labels  entries
        0       A        0
        1       A        3
        2       A        6
        3       A        9
        4       B        1
        5       B        4
        6       B        7
        7       B       10
        8       C        2
        9       C        5
        10      C        8
        11      C       11