I have the following data frame (number of "Date" columns can vary):
Customer Date1 Date2 Date3 Date4 0 A 10 40.0 NaN 60.0
1 B 20 50.0 NaN NaN
2 C 30 NaN NaN NaN
If there is a "NaN" in the last column (as said, number of columns can vary), I want to right shift all the columns to the end of the data frame such that it then looks like this:
Customer Date1 Date2 Date3 Date4
0 A 10 40.0 NaN 60.0
1 B NaN NaN 20 50.0
2 C NaN NaN NaN 30
All the values which remain empty can be set to NaN.
How can I do that in Python?
I tried this code but didn't work:
import numpy as np
import pandas as pd
data = {
'Customer': ['A', 'B', 'C'],
'Date1': [10, 20, 30],
'Date2': [40, 50, np.nan],
'Date3': [np.nan, np.nan, np.nan],
'Date4': [60, np.nan, np.nan]
}
df = pd.DataFrame(data)
for i in range(1, len(df.columns)):
df.iloc[:, i] = df.iloc[:, i-1].shift(fill_value=np.nan)
print(df)
You can temporarily set the non-target columns as index (or drop them), then push the non-NaNs to the right with sorting, and only update the rows that are matching a specific mask (here NaN in the last column):
out = (df
.set_index('Customer', append=True)
.pipe(lambda d: d.mask(d.iloc[:, -1].isna(),
d.transform(lambda x : sorted(x, key=pd.notnull), axis=1)
)
)
.reset_index('Customer')
)
Alternative:
other_cols = ['Customer']
out = df.drop(columns=other_cols)
m = out.iloc[:, -1].isna()
out.loc[m, :] = out.loc[m, :].transform(lambda x : sorted(x, key=pd.notnull), axis=1)
out = df[other_cols].join(out)[df.columns]
NB. there are several methods to shift non-NaNs, here is one, but non-sorting based methods are possible if this is a bottleneck.
Output:
Customer Date1 Date2 Date3 Date4
0 A 10.0 40.0 NaN 60.0
1 B NaN NaN 20.0 50.0
2 C NaN NaN NaN 30.0