I am new to both Bokeh and Pandas and I am trying to generate a grouped bar chart from some query results.
My data looks something like this
Day Fruit Count
----------- -------- -------
2020-01-01 Apple 19
2020-01-01 Orange 8
2020-01-01 Banana 7
...
2020-02-23 Apple 15
2020-02-23 Orange 10
2020-02-23 Banana 12
2020-02-24 Apple 12
2020-02-24 Orange 17
2020-02-24 Banana 9
In the answers with the old deprecated bokeh.charts API this data layout seems trivial to deal with.
I am having a real hard time understanding what is going on with the grouped chart example from the up to date API, and how to get my data into the format into the format shown in the example.
I tried generating a new column in my data frame that has a touple of day, fruit using a transform, but that fails with errors I don't understand. I don't even know if this is the right approach.
# add a grouped axis for group the bar chart
def grouped_axis (row ):
return ( row['Day'], row['Fruit'] )
data_frame['day_fruit']=data_frame2.apply ( lambda row: grouped_axis(row), axis=1 )
Can someone point me to an example that uses this kind of data? Or failing that, explain the code I need to get Bokeh to understand my data as a grouped bar chart?
What you're looking for is a method called pivot
.
But you don't really need it in this case - the Bokeh example you linked already deals with the pivoted data and that's why it has to massage it into an acceptable form. Whereas with the data shape that you already have, you don't need to do much.
Below you can find an example of both of the approaches. Notice how much simpler mk_src_2
is.
import pandas as pd
from bokeh.io import show
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure
data = pd.DataFrame([['2020-01-01', 'Apple', 19],
['2020-01-01', 'Orange', 8],
['2020-01-01', 'Banana', 7],
['2020-02-23', 'Apple', 15],
['2020-02-23', 'Orange', 10],
['2020-02-23', 'Banana', 12],
['2020-02-24', 'Apple', 12],
['2020-02-24', 'Orange', 17],
['2020-02-24', 'Banana', 9]],
columns=['day', 'fruit', 'count'])
def mk_src_1(d):
# Pivoting implicitly orders values.
d = d.pivot(index='fruit', columns='day', values='count')
x = [(fruit, day) for fruit in d.index for day in d.columns]
counts = sum(d.itertuples(index=False), ())
return ColumnDataSource(data=dict(x=x, counts=counts))
def mk_src_2(d):
# Bokeh's FactorRange requires the X values to be ordered.
d = d.sort_values(['fruit', 'day'])
return ColumnDataSource(data=dict(x=list(zip(d['fruit'], d['day'])),
counts=d['count']))
# source = mk_src_1(data)
source = mk_src_2(data)
p = figure(x_range=FactorRange(*source.data['x']), plot_height=250, title="Fruit Counts by Year",
toolbar_location=None, tools="")
p.vbar(x='x', top='counts', width=0.9, source=source)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
show(p)