I am simulating some data and trying to plot various samples of it using plotly
and ipythonwidgets
. I created dropdowns to let people choose the sample size and the number of samples that they want to collect from a population distribution generated with this:
from random import seed
from numpy.random import normal, negative_binomial, binomial
import pandas as pd
population_N = 1000000
population_data = pd.DataFrame({
"data.normal": normal(0, 1, population_N),
"data.poisson": negative_binomial(1, 0.5, population_N),
"data.binomial": binomial(1, 0.5, population_N)
})
For them to sample (either with or without a condition) and to average over the sample for multiple samples, I created the following function:
def custom_simple(df, sample_size, type = "random"):
"""
Description
----
Take the population data.frame and generate a sample with a specific sample size
Parameters
----
df(pd.DataFrame): the population dataset
sample_size(int): the number of rows in the sample
type(str): if random, pull a random sample
Returns
----
sample(pd.DataFrame)
"""
if type == "random":
single = df.sample(sample_size)
else:
condition = df["data.normal"] < 1
single = df[condition].sample(sample_size)
return sample
def mean_samples(df, sample_size, type = "random", num = 1):
"""
Description
----
Take each sample and calculate the mean for each variable in the sample
Parameters
----
df(pd.DataFrame): the population dataset
sample_size(int): the number of rows in the dataset
type(str): if random, then do random sample
num(int): number of samples to take
Returns
----
sample_means(pd.DataFrame)
"""
sample_means = pd.DataFrame()
def repeated_sample(df, sample_size, type = type, num = num):
"""
Description:
----
Take the population dataset and come up with a specified number of samples
Parameters
----
df(pd.DataFrame): the population dataset
sample_size(int): the number of rows in the dataset
type(str): if random, then randomly sample
num(int): the number of samples to generate
Returns
----
sample_list(list(pd.DataFrame)): a list of samples stored as DataFrames
"""
sample_list = [custom_simple(df, sample_size, type) for n in range(num)]
return sample_list
raw_samples = repeated_sample(df, sample_size, type = type, num = num)
for i in raw_samples:
sample_mans = pd.concat([sample_means, i.mean(axis = 0).to_frame().T])
return sample_means
What these functions essentially do is take the population_data
dataframe object, uses pd.DataFrame.sample()
to pass varying sample sizes and to then repeat these samples to then be averaged over.
I execute this function with a for loop that stores a dictionary element which is the column average for each sample size for each sample number:
sample_data = {}
sample_sizes = [20, 50, 100, 200, 500, 1000, 2000]
num_of_samples = [1, 2, 5, 10, 20, 50, 100]
for j in num_of_samples:
for i in sample_sizes:
sample_data['size_{}_sample_{}'.format(i,j)] = mean_samples(df = population_data, sample_size = i, type = "random", num = j)
sample_data["population_data"] = population_data
I then create a function that uses ipython.widgets.Dropdown
to give me a dropdown for various sample and number of samples which should connect to one of the dict
objects in sample_data
. The function also contains information to pass chosen dict
object to a plotly histogram.
def samples_histogram(dict, variable, type = "random"):
sample_widget = widgets.Dropdown(
options = ["20", "50", "100", "200", "500", "1000", "2000"],
value = "100",
description = "Sample size:"
)
sampling_widget = widgets.Dropdown(
options = ["2", "5", "10", "20", "50", "100"],
value = "2",
description = "# of samples"
)
trace = go.Histogram(x = dict.get("population_data")[variable])
fig = go.FigureWidget(data = trace)
def response(change):
if sample_widget.value == "20" and sampling_widget.value == "2":
temp_df = dict.get("size_20_sample_1")
temp_df = list(dict.items())
temp_df = temp_df[temp_df[0]=='size_20_sample_1'][1]
with fig.batch_update():
fig.data = [temp_df[variable].tolist()]
sample_widget.observe(response, names = "value")
sampling_widget.observe(response, names = "value")
box = widgets.VBox([
sample_widget,
sampling_widget,
fig
])
display(box)
The problem I am running into, is that the dict
object that I am passing, even when I convert it to a list (see this line):
with fig.batch_update():
fig.data = [temp_df[variable].tolist()]
It gives me this error:
ValueError: The data property of a figure may only be assigned
a list or tuple that contains a permutation of a subset of itself.
Received element value of type <class 'list'>
I am not sure if there is some way I can transform the objects I am passing to play nicer with plotly or if I am just missing something.
If I understand correctly, you want to update the graph plot, but the way that you update the figure is wrong. fig.data
is an object of plotly.graph_objs._histogram.Histogram
. To modify the histogram, you need to modify its x
attribute
Try
fig.data[0].x = temp_df[variable].tolist()
Also your code has a lot of typos. Try changing simple
to sample
, and sample_mans
to sample_means
. The indentation is also incorrect
def response(change):
if sample_widget.value == "20" and sampling_widget.value == "2":
temp_df = dict.get("size_20_sample_1")
temp_df = list(dict.items())
temp_df = temp_df[temp_df[0]=='size_20_sample_1'][1]
with fig.batch_update():
fig.data = [temp_df[variable].tolist()]
To (It is only valid if reference sample_widget.value == "20" and sampling_widget.value == "2"
)
def response(change):
if sample_widget.value == "20" and sampling_widget.value == "2":
temp_df = dict.get("size_20_sample_1")
temp_df = list(dict.items())
temp_df = temp_df[temp_df[0]=='size_20_sample_1'][1]
with fig.batch_update():
fig.data[0].x = temp_df[variable].tolist()