advanced logic with groupby, apply and transform - compare row value with previous value and create new column

I have the following pandas dataframe:

d= {'Time': [0,1,2,0,1,2,2,3,4], 'Price': ['Auction', 'Auction','800','900','By Negotiation','700','250','250','Make Offer'],'Item': ['Picasso', 'Picasso', 'Picasso', 'DaVinci', 'DaVinci', 'DaVinci', 'Dali', 'Dali', 'Dali']}
df = pd.DataFrame(data=d)

I would like to create a fourth column 'Listing-history' which would specify the following:

'first seen' if the listing is seen for the first time (this is not necessarily Time == 0)
'ongoing listing' if there is no change to the price field from one timepoint to the next
'Price->Auction' if the price field changes from a numeric value (which is actually encoded as a string in my dataframe) to the 'Auction' string, and vice versa if the price changes from a numeric value to the 'Auction' string.
I would like the code to be agnostic to the exact price field string, for example: 'Price->By Negotiation' if the price field changes from a numeric value to 'By Negotiation'

I want to group by Item, and then apply the above logic.

Finding whether a listing is 'first seen' is pretty straight forward using something like the following:

df['Price_coerced_to_numeric'] = pd.to_numeric(df['Price'], errors='coerce')
df['Price_diff'] = df.groupby(['Item'])['Price_coerced_to_numeric'].diff(1)

I suspect there is a way of using pandas apply and transform but I haven't been able to work it out. Any tips much appreciated.

Solution

You could use groupby.shift and numpy.select:

# replace numbers by "Price"
price = df['Price'].mask(pd.to_numeric(df['Price'], errors='coerce')
                           .notna(), 'Price')
# get previous price
prev_price = price.groupby(df['Item']).shift()

# identify first row per Item
m1 = ~df['Item'].duplicated()
# identify change in price
m2 = price.ne(prev_price)

# combine conditions
df['Listing-history'] = np.select([m1, m2],
                                  ['first seen', prev_price+'->'+price],
                                  'ongoing listing')

Output:

   Time           Price     Item        Listing-history
0     0         Auction  Picasso             first seen
1     1         Auction  Picasso        ongoing listing
2     2             800  Picasso         Auction->Price
3     0             900  DaVinci             first seen
4     1  By Negotiation  DaVinci  Price->By Negotiation
5     2             700  DaVinci  By Negotiation->Price
6     2             250     Dali             first seen
7     3             250     Dali        ongoing listing
8     4      Make Offer     Dali      Price->Make Offer

If you really want to use groupby.transform you code refactor the code a bit:

def history(col):
    price = col.mask(pd.to_numeric(col, errors='coerce').notna(), 'Price')
    prev_price = price.shift()
    return ((prev_price+'->'+price)
            .where(price.ne(prev_price), 'ongoing listing')
            .fillna('first seen') 
           )

df['Listing-history'] = df.groupby('Item')['Price'].transform(history)

variant if you can have NaNs in the original column:

def history(col):
    price = col.mask(pd.to_numeric(col, errors='coerce').notna(), 'Price')
    prev_price = price.shift()
    out = (prev_price+'->'+price).where(price.ne(prev_price), 'ongoing listing')
    out.iat[0] = 'first seen'
    return out

df['Listing-history'] = df.groupby('Item')['Price'].transform(history)