[SOLVED] Best way to turn every cell in a dataframe into its own row in a new dataframe?

Best way to turn every cell in a dataframe into its own row in a new dataframe?

Suppose I have a dataframe Old with columns A, B, and C. I want a new dataframe New where two columns D and E. For each cell in Old, I want a corresponding row in the D column in New where the value in E is the name of the column the cell was in.

I know that straight up iterating over a dataframe is bad, but that's how I did it. Here, I only cared about some column names in the Old dataframe, so if the cell wasn't under a column I cared about, I just assigned it the value other. But the principle is the same.

for column in df.columns:
    for entry in df[column]:
        entries.append(entry)
        labels.append(column_labels.get(column, "other"))  # Assign label based on column

My question is what are some better ways to do this? Running this will become exceedingly slow as the dataset grows.

Solution

You must be looking for stack():

    df = pd.DataFrame(np.arange(12).reshape((4,3)), columns=list("ABC"))
    
       A   B   C
    0  0   1   2
    1  3   4   5
    2  6   7   8
    3  9  10  11
    
    res = (df.stack()
             .reset_index(level=1)
             .sort_values(by="level_1")
             .reset_index(drop=True)
             .rename(columns={"level_1":"labels", 0:"entries"})
    )
    
       labels  entries
    0       A        0
    1       A        3
    2       A        6
    3       A        9
    4       B        1
    5       B        4
    6       B        7
    7       B       10
    8       C        2
    9       C        5
    10      C        8
    11      C       11