I would like to create a stacked bar chart from this data frame where the x axis is each unique date and the stacked bars are the values are drawn from each numerical value under the provider column.
When I create a pivot table, the data aggregates for the columns that have the same exact name. If I pivot with the 'provider' as the new columns, then this makes 5 columns and 14 rows. The issue is bokeh vbar_stack does not accept different columns and rows. There must be the same number of columns and rows. However, I cannot get the pivot table made without the data aggregating.
can I transform this data and use the bokeh package to create a stacked bar chart?
Code:
pivot_df = grouped_df.pivot_table(index=['date'], columns='provider', values='num_youths', aggfunc='first', fill_value=0)
pivot_df.reset_index(inplace=True)
source = ColumnDataSource(pivot_df)
providers = pivot_df.columns[1:]
# Create the figure
p = figure(x_range=pivot_df['date'].unique(), plot_height=350, title="Number of Youths Funded by Provider Each Month",
toolbar_location=None, tools="")
# Add stacked bars to the figure
p.vbar_stack(stackers=providers, x='date', width=0.9, color=["blue", "red"], source=source,
legend_label=providers)
Error message: ValueError: Keyword argument sequences for broadcasting must be the same length as stackers
You have to handle you pandas DataFrame in the correct way.
In the example below is a minimal example of your data. I use groupby
and unstack
with a filling mode to add zeros if a not all groups have a value on each date.
Afterwards I drop the mulit-index of the returned DataFrame.
import pandas as pd
df = pd.DataFrame({
'date': ['Aug 23', 'Aug 23', 'Dec 23'],
'provider': ['A', 'B', 'C'],
'num_youths': [1, 3, 4]
}
)
df
>>> df
date provider num_youths
0 Aug 23 A 1
1 Aug 23 B 3
2 Dec 23 C 4
# groupby and fill with zeor
stacked = df.groupby(['date','provider']).sum().unstack(fill_value=0)
>>> stacked
num_youths
provider A B C
date
Aug 23 1 3 0
Dec 23 0 0 4
# drop multi index for columns and index
stacked.columns = stacked.columns.droplevel()
provider = list(stacked.columns)
stacked = stacked.reset_index()
To get the data bokeh wants, you have to call to_dict
with orient="list"
.
data = stacked.to_dict(orient='list')
The data has the corect format, so just call figure()
and vbar_stack
. The most of this code comes from the stacked bar example from the docs.
from bokeh.plotting import figure, show, output_notebook
from bokeh.palettes import HighContrast3
output_notebook()
p = figure(x_range=data['date'], height=250,
toolbar_location=None, tools="hover", tooltips="@date $name @$name")
p.vbar_stack(provider, x='date', width=0.9, color=HighContrast3, source=data,
legend_label=provider)
show(p)