pythonpandaswide-format-data

Convert wide format data (separate dfs) to long format using Python


Convert wide format data in separate dfs to long format in a single df in Python. Some values are NaNs.

Minimal example:

df1 = pd.DataFrame({
                     "id": ["Mark", "Dave", "Ron" ], 
                     "c2_A": [2, 3, np.nan ], 
                     "c3_A": [1, np.nan, np.nan ] })

df2 = pd.DataFrame({
                     "id": ["Mark", "Dave", "Ron" ], 
                     "c2_B": [1, 0, np.nan ], 
                     "c3_B": [1, np.nan, 4 ] })

Required df:

dffinal = pd.DataFrame({
                     "id": ["Mark", "Mark","Dave", "Dave", "Ron" , "Ron"], 
                        "cValue": ["A", "B","A", "B", "A", "B"],
                     "c2Value": [2, 1, 3,0,np.nan,np.nan ], 
                     "c3Value": [1, 1, np.nan,np.nan,np.nan,4 ] }

Solution

  • You can try one of these two options:

    With split/stack:

    dffinal = (
        pd.concat([df1, df2])
            .set_index("id", append=True).pipe(
                lambda x: x.set_axis(x.columns.str.split("_", expand=True), axis=1))
            .stack(1, dropna=False).groupby(level=[1, 2],sort=False).first()
            .add_suffix("Value").reset_index().rename(columns={"level_1": "cValue"})
    )
    

    With wide_to_long:

    dffinal = (
        pd.concat([df1, df2], keys=["1", "2"])
            .reset_index(level=0).pipe(
                pd.wide_to_long, stubnames=["c2", "c3"],
                i=["level_0", "id"], j="cValue", sep="_", suffix=r"\w+")
            .groupby(level=[1, 2], sort=False).first().add_suffix("Value").reset_index()
    )
    

    Output:

    print(dffinal)
    
         id cValue  c2Value  c3Value
    0  Mark      A     2.00     1.00
    1  Mark      B     1.00     1.00
    2  Dave      A     3.00      NaN
    3  Dave      B     0.00      NaN
    4   Ron      A      NaN      NaN
    5   Ron      B      NaN     4.00