I am currently working on a data visualisation project. I need to create a stacked bar chart using Bokeh. The data is sourced from an Excel file that is updated regularly and contains multiple inputs and outputs. The structure of the data can vary, meaning that the number of inputs and outputs can change. My aim is to ensure that the Bokeh chart can adapt to these changes automatically without requiring manual code adjustments. The stacked bar chart should be able to dynamically adjust to changes in the number of inputs and outputs in the Excel file. It should be capable of visualising various combinations of fixed inputs (e.g. input 1 fixed, input 2 fixed, input 3 variable) while displaying all corresponding outputs. Ideally, the solution would automatically read the Excel file, detect the structure of the inputs and outputs, and update the chart accordingly.
I have tried to visualize the stacked bar charts for a set scenario ( 2 Inputs and 3 Outputs). in the example excel file the data was stored in a scheme like this this:
Example Scheme:
Input_1 | Input_2 | Output_1 | Output_2 | Output_3 |
---|---|---|---|---|
1 | 1 | 100 | 200 | 200 |
2 | 1 | 150 | 150 | 200 |
3 | 1 | 200 | 100 | 200 |
1 | 2 | 200 | 200 | 100 |
2 | 2 | 150 | 200 | 150 |
3 | 2 | 100 | 200 | 200 |
1 | 3 | 200 | 100 | 200 |
2 | 3 | 200 | 150 | 150 |
3 | 3 | 200 | 200 | 100 |
It worked for this static scenario:
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.layouts import gridplot, row
from bokeh.models import ColumnDataSource
data_frame = pd.read_excel("example.xlsx")
data_frame['Input_1'] = data_frame['Input_1'].astype(str)
data_frame['Input_2'] = data_frame['Input_2'].astype(str)
output_file("stacked_bar_charts.html")
unique_input_1 = data_frame['Input_1'].unique()
unique_input_2 = data_frame['Input_2'].unique()
plots_for_input_1 = []
plots_for_input_2 = []
for value in unique_input_1:
filtered_data = data_frame[data_frame["Input_1"] == value]
filtered_data = filtered_data.sort_values(by='Input_2')
source = ColumnDataSource(filtered_data)
plot = figure(title=f"Input_1 = {value} fixed",
x_range=filtered_data["Input_2"].unique(),
height=300,
width=500
)
plot.xaxis.axis_label = "Input_2"
plot.vbar_stack(stackers=["Output_1", "Output_2", "Output_3"],
x="Input_2",
width=0.9,
color=["orange", "gray", "brown"],
source=source,
legend_label=["Output 1", "Output 2", "Output 3"]
)
plots_for_input_1.append(plot)
for value in unique_input_2:
filtered_data = data_frame[data_frame['Input_2'] == value]
filtered_data = filtered_data.sort_values(by='Input_1')
source = ColumnDataSource(filtered_data)
plot = figure(title=f"Input_2 = {value} fixed",
x_range=filtered_data["Input_1"].unique(),
height=300,
width=500
)
plot.xaxis.axis_label = "Input_1"
plot.vbar_stack(stackers=["Output_1", "Output_2", "Output_3"],
x="Input_1",
width=0.9,
color=["orange", "gray", "brown"],
source=source,
legend_label=["Output 1", "Output 2", "Output 3"]
)
plots_for_input_2.append(plot)
grid_for_input_1 = gridplot(plots_for_input_1, ncols=1)
grid_for_input_2 = gridplot(plots_for_input_2, ncols=1)
final_layout = row(grid_for_input_1, grid_for_input_2)
show(final_layout)
Here are output example pictures to show how I aim to visualize the data:
bokeh example plot 1:
bokeh example plot 2:
I was unable to find a dynamic approach that works for changing inputs and outputs, such as varying two inputs while keeping the others constant and visualizing the impacts on all outputs through stacked bar charts. Additionally, the approach should efficiently adapt to data updates without requiring manual code adjustments for each new scenario.
You can use a groupby
inside a nested-loop to plot all the possible I/O combinations :
output_file("stacked_bar_charts.html")
grids = []
for inp_col in inp_cols:
inplots = []
for name, sub_df in data_frame.groupby(inp_col):
for inp_diff in inp_cols.difference([inp_col], sort=False):
plot = figure(
title=TITLE(inp_col, name),
x_range=data_frame[inp_diff].unique(),
height=H, width=W,
)
plot.xaxis.axis_label = inp_diff
_ = plot.vbar_stack(
stackers=out_cols, x=inp_diff,
width=BAR_WIDTH, color=COLORS,
source=ColumnDataSource(sub_df.sort_values(inp_diff)),
legend_label=out_cols.tolist(),
)
inplots.append(plot)
grids.append(gridplot(inplots, ncols=NCOLS))
final_layout = row(*grids)
show(final_layout)
Output (stacked_bar_charts.html
) :
Used configuration :
import pandas as pd
from bokeh.layouts import gridplot, row
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure, output_file, show
# PD-PREPROCESS
data_frame = pd.read_excel("example.xlsx")
inp_cols = data_frame.filter(like="Input").columns
out_cols = data_frame.filter(like="Output").columns
data_frame = data_frame.astype(dict.fromkeys(inp_cols, str))
# BOKEH-CONFIG
TITLE = "{} = {} fixed".format
COLORS = ["orange", "gray", "brown"] # depends on outputs
BAR_WIDTH = 0.9
H, W = 300, 500
NCOLS = 1