Sorry for the mouthful title. I think this is best illustrated by an example. Let's say we have an item that has different rarity levels, all of which have dfferent prices in different shops. I want to know how much more expensive a given rarity is than the base "normal" rarity in each shop separately.
How could I add a new "premium" column that would give me the result of dividing the price of the item with a given rarity at a given shop, by the price of that item in "normal" quality at that shop in particular? The result can be seen in the table below.
item | quality | price | shop | premium (todo) | base price |
---|---|---|---|---|---|
bread | normal | 2.0 | Lumbridge | 1.0 | 2.0 |
bread | rare | 3.0 | Lumbridge | 1.5 | 2.0 |
bread | legendary | 5.0 | Lumbridge | 2.5 | 2.0 |
bread | normal | 1.5 | Varrock | 1.0 | 1.5 |
bread | rare | 4.5 | Varrock | 3.0 | 1.5 |
bread | legendary | 6.0 | Varrock | 4.0 | 1.5 |
bread | normal | 3.0 | Yanille | 1.0 | 3.0 |
bread | rare | 2.0 | Yanille | 0.66 | 3.0 |
bread | legendary | 4.0 | Yanille | 1.33 | 3.0 |
I thought about repeating the rows with normal quality as a new column (given as the "base price" column), but I don't see any mechanism that could allow for that. If this is possible could it be done if instead of just one "item" we had many (i.e. extending the filtering to multiple columns)?
Here's one approach, assuming a single 'normal' price per item / shop.
Data used
# adding another item, and deleting 1 shop
import pandas as pd
data = {'item': {0: 'bread', 1: 'bread', 2: 'bread', 3: 'bread', 4: 'bread',
5: 'bread'},
'quality': {0: 'normal', 1: 'rare', 2: 'legendary', 3: 'normal',
4: 'rare', 5: 'legendary'},
'price': {0: 2.0, 1: 3.0, 2: 5.0, 3: 1.5, 4: 4.5, 5: 6.0},
'shop': {0: 'Lumbridge', 1: 'Lumbridge', 2: 'Lumbridge', 3: 'Varrock',
4: 'Varrock', 5: 'Varrock', },
'premium (todo)': {0: 1.0, 1: 1.5, 2: 2.5, 3: 1.0, 4: 3.0, 5: 4.0},
'base price': {0: 2.0, 1: 2.0, 2: 2.0, 3: 1.5, 4: 1.5, 5: 1.5}}
df = pd.DataFrame(data)
df = pd.concat([df, df.assign(item=df['item'].replace('bread', 'milk'))],
ignore_index=True)
Code
df = (
df.merge(
df.query('quality == "normal"')[['item', 'shop', 'price']],
on=['item', 'shop'],
suffixes=('', '_base'),
how='left'
)
.assign(premium=lambda x: x['price'].div(x.pop('price_base')))
)
Output:
item quality price shop premium (todo) base price premium
0 bread normal 2.0 Lumbridge 1.0 2.0 1.0
1 bread rare 3.0 Lumbridge 1.5 2.0 1.5
2 bread legendary 5.0 Lumbridge 2.5 2.0 2.5
3 bread normal 1.5 Varrock 1.0 1.5 1.0
4 bread rare 4.5 Varrock 3.0 1.5 3.0
5 bread legendary 6.0 Varrock 4.0 1.5 4.0
6 milk normal 2.0 Lumbridge 1.0 2.0 1.0
7 milk rare 3.0 Lumbridge 1.5 2.0 1.5
8 milk legendary 5.0 Lumbridge 2.5 2.0 2.5
9 milk normal 1.5 Varrock 1.0 1.5 1.0
10 milk rare 4.5 Varrock 3.0 1.5 3.0
11 milk legendary 6.0 Varrock 4.0 1.5 4.0
Explanation / intermediates
df.query
to filter on 'normal' and select only columns ['item', 'shop', 'price']
.df.query('quality == "normal"')[['item', 'shop', 'price']]
item shop price
0 bread Lumbridge 2.0
3 bread Varrock 1.5
6 milk Lumbridge 2.0
9 milk Varrock 1.5
df
to df.merge
as right
, left joining on ['item', 'shop']
and adding a suffix for column 'price' coming from right
(_base
). This generates your 'base price' column:# df.merge(...)['price_base']
0 2.0
1 2.0
2 2.0
3 1.5
4 1.5
5 1.5
6 2.0
7 2.0
8 2.0
9 1.5
10 1.5
11 1.5
Name: price_base, dtype: float64
df.assign
and use Series.div
with Series.pop
for the divisor, assigning the result as new column 'premium'.Easier alternative, if:
df
is properly sorted on 'item', 'quality' (categorical, with 'normal' first), 'shop' (as in your example), and;NaN
value.df['premium'] = df['price'].div(
df['price'].where(df['quality'] == 'normal').ffill()
)
Using Series.div
+ Series.where
+ Series.ffill
.