pythonpandasdataframedata-scienceshift

Python / Pandas: Shift entities of a row to the right (end)


I have the following data frame (number of "Date" columns can vary):

Customer Date1 Date2 Date3 Date4 0 A 10 40.0 NaN 60.0

1 B 20 50.0 NaN NaN

2 C 30 NaN NaN NaN

If there is a "NaN" in the last column (as said, number of columns can vary), I want to right shift all the columns to the end of the data frame such that it then looks like this:

Customer Date1 Date2 Date3 Date4

0 A 10 40.0 NaN 60.0

1 B NaN NaN 20 50.0

2 C NaN NaN NaN 30

All the values which remain empty can be set to NaN.

How can I do that in Python?

I tried this code but didn't work:

import numpy as np
import pandas as pd

data = {
    'Customer': ['A', 'B', 'C'],
    'Date1': [10, 20, 30],
    'Date2': [40, 50, np.nan],
    'Date3': [np.nan, np.nan, np.nan],
    'Date4': [60, np.nan, np.nan]
}

df = pd.DataFrame(data)


for i in range(1, len(df.columns)):
    df.iloc[:, i] = df.iloc[:, i-1].shift(fill_value=np.nan)

print(df)

Solution

  • You can temporarily set the non-target columns as index (or drop them), then push the non-NaNs to the right with sorting, and only update the rows that are matching a specific mask (here NaN in the last column):

    out = (df
       .set_index('Customer', append=True)
       .pipe(lambda d: d.mask(d.iloc[:, -1].isna(),
                              d.transform(lambda x : sorted(x, key=pd.notnull), axis=1)
                             )
            )
       .reset_index('Customer')
    )
    

    Alternative:

    other_cols = ['Customer']
    out = df.drop(columns=other_cols)
    m = out.iloc[:, -1].isna()
    out.loc[m, :] = out.loc[m, :].transform(lambda x : sorted(x, key=pd.notnull), axis=1)
    out = df[other_cols].join(out)[df.columns]
    

    NB. there are several methods to shift non-NaNs, here is one, but non-sorting based methods are possible if this is a bottleneck.

    Output:

      Customer  Date1  Date2  Date3  Date4
    0        A   10.0   40.0    NaN   60.0
    1        B    NaN    NaN   20.0   50.0
    2        C    NaN    NaN    NaN   30.0