pythonpandasbokehpandas-bokeh

plot stacked bar chart using bokeh


I am trying to plot a stacked bar chart using bokeh by following this segment of the documentation. but my data frame is a tad more complex. it looks like this:

   events    count     Name
    a          2       jerry
    b          1       jerry
    a          8       joe
    c          1       joe 
    b          4       megan
    c          1       megan 
   ...        ...       ...

data.user.nunique() = 11 (will be in columns) and data.event.nunique() = 167 (will be the stacked segments for each column note that not every user has raised all unique events)

so according to code from the docs and for the above segment of dataframe:

output_file("stacked.html")
names = data.Name.unique()          # ['jerry','joe','megan']
events = data.events.unique()       # ['a','b','c']
colors =["#c9d9d3", "#718dbf", "#e84d60"]        

data = {'names' : names,
        'a'   : [2, 8, 0],   # a raised 2 times by jerry, 8 times by joe , 0 times by megan
        'b'   : [1, 0, 4],
        'c'   : [0, 1, 1]}  

my question is twofold, 1) how do I create the data dictionary from my actual dataset? 2) is there any alternative approach to solving this problem?


Solution

  • bokeh doesn't necessarily need a dictionary to work, so we can actually just use the pivot Dataframe method to achieve the desired transformation and plot the result directly.

    >>> df = pd.DataFrame({
        'events': ['a', 'b', 'a', 'c', 'b', 'c'],
        'count': [2, 1, 8, 1, 4, 1],
        'Name': ['jerry', 'jerry', 'joe', 'joe', 'megan', 'megan']})
    
    >>> df
      events  count   Name
    0  a      2      jerry
    1  b      1      jerry
    2  a      8      joe  
    3  c      1      joe  
    4  b      4      megan
    5  c      1      megan
    

    Transform the data:

    >>> df2 = df.pivot(index="Name", columns="events", values="count").fillna(0)
    >>> df2
    events  a   b   c
    Name            
    jerry   2.0 1.0 0.0
    joe     8.0 0.0 1.0
    megan   0.0 4.0 1.0
    

    Plot the data:

    from bokeh.plotting import figure
    from bokeh.palettes import viridis
    
    names = df2.index.tolist()
    events = df2.columns.tolist()
    color = viridis(len(events))
    
    p = figure(x_range=names)
    p.vbar_stack(events, x="Name", source=df2, width=.9, color=color), legend_label=events)
    show(p)
    

    enter image description here

    An alternative way of plotting this is to use the holoviews library (simply adding this because holoviews can produce some waaay more concise code than bokeh). Holoviews takes care of the data transformations for you so you don't need any added effort:

    import holoviews as hv
    hv.extension("bokeh")
    
    hv.Bars(df, kdims=["Name", "events"], vdims="count").opts(stacked=True)
    

    enter image description here

    As for alternative solutions, I'm not entirely sure. I can't see visual comparisons being very easy with 167 types of events (that's 167 unique colors, so the colors may not be extremely discernable- not to mention an unwieldly legend with 167 entries). If this way of visualizing doesn't help, I would recommend using the Holoviews library to create a barplot for each of your names. Then you can toggle through a plot for each individual you have in the data.

    import holoviews as hv
    hv.extension("bokeh")
    
    hv.Bars(df, kdims=["Name", "events"], vdims="count").groupby("Name")
    

    1