I have a dataframe and two lists as below:
seller1 = [5, 4, 3]
seller2 = [4, 2, 1]
df = {'customer': [1, 1, 1, 2, 2, 2], 'time': [1,2,3,1,2,3], 'location': [3,4,2,4,3,3], 'demand':[10,12,15,20,8,16], 'price':[3,4,4,5,2,1]}
df = pd.DataFrame(df)
Which results in the following table:
customer time location demand price
0 1 1 3 10 3
1 1 2 4 12 4
2 1 3 2 15 4
3 2 1 4 20 5
4 2 2 3 8 2
5 2 3 3 16 1
The seller1
and seller2
lists show where the sellers are at time 1,2, and 3. I want to know the demand and the price if one of the sellers is there at the exact time and mask the demand data otherwise. For example, at time 1, seller one is at location 5 and seller 2 is at location 4. Likewise, customer 1 is at location 3 and customer 2 is at location 4. So, the sellers meet the first customer but not the second at t=1.
The end table I want to have is
customer time location demand price
0 1 1 3 None None
1 1 2 4 12 4
2 1 3 2 None None
3 2 1 4 20 5
4 2 2 3 None None
5 2 3 3 16 1
So far, I have
for i in range(df.shape[0]):
if df["location"][i] != seller1[int(df["time"][i])-1] and df["location"][i] != seller2[int(df["time"][i])-1]:
df["demand"][i] = np.nan
df["price"][i] = np.nan
This is producing a SettingWithCopyWarning:
and it doesn't look efficient with the for loop, either.
Is there a way to do this with df.mask()?
# Create tuples corresponding to time and location for each seller
c = ['time', 'location']
sellers = [(i, x)
for s in (seller1, seller2)
for i, x in enumerate(s, 1)]
# Identify the rows where tuple pairs match with
# the time and location from the given dataframe
mask = df.set_index(c).index.isin(sellers)
# Mask the rows where codition doesn;t hold true
c = ['demand', 'price']
df.loc[~mask, c] = np.nan
Result
customer time location demand price
0 1 1 3 NaN NaN
1 1 2 4 12.0 4.0
2 1 3 2 NaN NaN
3 2 1 4 20.0 5.0
4 2 2 3 NaN NaN
5 2 3 3 16.0 1.0