I have a data frame like
Bus_no. City
3412. Kolkata
7658. Nagpur
3412. Mumbai
5516. Kolkata
7658. Chennai
3412. Mumbai
I need to do a dynamic calculation based on these columns like for example bus_no.3412 is appearing in the whole dataset 3 times and appearing in kolkata 1time and Mumbai 2 times. So for 3412 in kolkata it will be = (1/3)*100 and for Mumbai = (2/3)*100. Similarly I need calculation for each bus_no. In each city. I am using python 3.6 . Kindly help
So for 3412 in kolkata it will be = (1/3)*100 and for Mumbai = (2/3)*100.
Similarly I need calculation for each bus_no. In each city.
I am using python 3.6.
Output should be like:
Busno. | Kolkata | mumbai | Chennai | nagpur |
---|---|---|---|---|
3412. | Value | Value | Value | value |
Kindly help
If you want to include all cities per bus (i.e. including the ones with 0%
), you can use pd.crosstab
with normalize='index'
, chaining df.mul
and df.stack
:
import pandas as pd
data = {'Bus_no.': {0: '3412.', 1: '7658.', 2: '3412.', 3: '5516.', 4: '7658.',
5: '3412.'},
'City': {0: 'Kolkata', 1: 'Nagpur', 2: 'Mumbai', 3: 'Kolkata', 4: 'Chennai',
5: 'Mumbai'}}
df = pd.DataFrame(data)
out = (pd.crosstab(index=df['Bus_no.'],
columns=df['City'],
normalize='index')
.mul(100)
.stack()
)
Output
Bus_no. City
3412. Chennai 0.000000
Kolkata 33.333333
Mumbai 66.666667
Nagpur 0.000000
5516. Chennai 0.000000
Kolkata 100.000000
Mumbai 0.000000
Nagpur 0.000000
7658. Chennai 50.000000
Kolkata 0.000000
Mumbai 0.000000
Nagpur 50.000000
dtype: float64
Edit: without chaining .stack()
and instead chaining df.reset_index
and df.rename_axis
to get this in the desired format:
(pd.crosstab(index=df['Bus_no.'],
columns=df['City'],
normalize='index')
.mul(100).reset_index().rename_axis(columns=None)
)
Bus_no. Chennai Kolkata Mumbai Nagpur
0 3412. 0.0 33.333333 66.666667 0.0
1 5516. 0.0 100.000000 0.000000 0.0
2 7658. 50.0 0.000000 0.000000 50.0
Alternative with df.groupby
Without needing all cities per bus, you could also use df.groupby
and get the normalized value_counts
:
out2 = df.groupby('Bus_no.')['City'].value_counts(normalize=True).mul(100)
Output alternative
Bus_no. City
3412. Mumbai 66.666667
Kolkata 33.333333
5516. Kolkata 100.000000
7658. Chennai 50.000000
Nagpur 50.000000
Name: proportion, dtype: float64