Edited to comply with the rules:
How can I chain the following code? I cannot seem to be able to add new column with chain.
Input columns of Dataset:
ORDER, ITEM_SERIAL, DATE
%%time
df = Dataset.groupby(['ORDER', pd.Grouper(key='DATE', freq='7d')]).agg(Min=('DATE','min'), Max=('DATE','max'), ITEM_SERIAL_Min=('ITEM_SERIAL','first'), ITEM_SERIAL_Max=('ITEM_SERIAL','last')).reset_index()
df['DIFFERENCE'] = df['Max'] - df['Min']
df = df[df['DIFFERENCE']!='0s'].sort_values('DIFFERENCE')
df = df.drop('DATE', axis=1)
df['ORDER'] = df['ORDER'] + '_' + df['Min'].map(str).str[:10]
df
Output columns of df:
ORDER, Min, Max, ITEM_SERIAL_Min, ITEM_SERIAL_Max, DIFFERENCE
The question is quite subjective, but you could chain all operations in a formatted way (and add comments!):
df = (
Dataset
# groupby and aggregate ...
.groupby(['ORDER', pd.Grouper(key='DATE', freq='7d')], as_index=False)
.agg(
Min=('DATE', 'min'),
Max=('DATE', 'max'),
ITEM_SERIAL_Min=('ITEM_SERIAL', 'first'),
ITEM_SERIAL_Max=('ITEM_SERIAL', 'last'),
)
# compute DIFFERENCE ...
.assign(DIFFERENCE=lambda d: d['Max'] - d['Min'])
# filter ...
.loc[lambda d: d['DIFFERENCE'] != '0s']
# sort rows by ascending DIFFERENCE
.sort_values('DIFFERENCE')
# remove DATE column
.drop('DATE', axis=1)
# compute ORDER ...
.assign(ORDER=lambda d: d['ORDER'] + '_' + d['Min'].astype(str).str[:10])
)