pythonplotlypython-interactive

plotly error: ValueError: The data property of a figure may only be assigned a list or tuple that contains a permutation of a subset of itself


I am simulating some data and trying to plot various samples of it using plotly and ipythonwidgets. I created dropdowns to let people choose the sample size and the number of samples that they want to collect from a population distribution generated with this:

from random import seed
from numpy.random import normal, negative_binomial, binomial
import pandas as pd

population_N = 1000000
population_data = pd.DataFrame({
    "data.normal": normal(0, 1, population_N),
    "data.poisson": negative_binomial(1, 0.5, population_N),
    "data.binomial": binomial(1, 0.5, population_N)
})

For them to sample (either with or without a condition) and to average over the sample for multiple samples, I created the following function:

def custom_simple(df, sample_size, type = "random"):
    """
    Description
    ----
    Take the population data.frame and generate a sample with a specific sample size
    
    Parameters
    ----
    df(pd.DataFrame): the population dataset
    sample_size(int): the number of rows in the sample
    type(str): if random, pull a random sample
    
    Returns
    ----
    sample(pd.DataFrame)
    """
    if type == "random":
        single = df.sample(sample_size)
    else:
        condition = df["data.normal"] < 1
        single = df[condition].sample(sample_size)
    return sample

def mean_samples(df, sample_size, type = "random", num = 1):
    """
    Description
    ----
    Take each sample and calculate the mean for each variable in the sample
    
    Parameters
    ----
    df(pd.DataFrame): the population dataset
    sample_size(int): the number of rows in the dataset
    type(str): if random, then do random sample
    num(int): number of samples to take
    
    Returns
    ----
    sample_means(pd.DataFrame)
    """
    sample_means = pd.DataFrame()
    def repeated_sample(df, sample_size, type = type, num = num):
        """
        Description:
        ----
        Take the population dataset and come up with a specified number of samples
        
        Parameters
        ----
        df(pd.DataFrame): the population dataset
        sample_size(int): the number of rows in the dataset
        type(str): if random, then randomly sample
        num(int): the number of samples to generate
        
        Returns
        ----
        sample_list(list(pd.DataFrame)): a list of samples stored as DataFrames
        """
        sample_list = [custom_simple(df, sample_size, type) for n in range(num)]
        return sample_list
    raw_samples = repeated_sample(df, sample_size, type = type, num = num)
    for i in raw_samples:
        sample_mans = pd.concat([sample_means, i.mean(axis = 0).to_frame().T])
    return sample_means

What these functions essentially do is take the population_data dataframe object, uses pd.DataFrame.sample() to pass varying sample sizes and to then repeat these samples to then be averaged over.

I execute this function with a for loop that stores a dictionary element which is the column average for each sample size for each sample number:

sample_data = {}
sample_sizes = [20, 50, 100, 200, 500, 1000, 2000]
num_of_samples = [1, 2, 5, 10, 20, 50, 100]

for j in num_of_samples:
    for i in sample_sizes:
        sample_data['size_{}_sample_{}'.format(i,j)] = mean_samples(df = population_data, sample_size = i, type = "random", num = j)
sample_data["population_data"] = population_data

I then create a function that uses ipython.widgets.Dropdown to give me a dropdown for various sample and number of samples which should connect to one of the dict objects in sample_data. The function also contains information to pass chosen dict object to a plotly histogram.

def samples_histogram(dict, variable, type = "random"):
    sample_widget = widgets.Dropdown(
        options = ["20", "50", "100", "200", "500", "1000", "2000"],
        value = "100",
        description = "Sample size:"
    )
    sampling_widget = widgets.Dropdown(
        options = ["2", "5", "10", "20", "50", "100"],
        value = "2",
        description = "# of samples"
    )
    trace = go.Histogram(x = dict.get("population_data")[variable])
    fig = go.FigureWidget(data = trace)
    def response(change):
        if sample_widget.value == "20" and sampling_widget.value == "2":
            temp_df = dict.get("size_20_sample_1")
            temp_df = list(dict.items())
            temp_df = temp_df[temp_df[0]=='size_20_sample_1'][1]
        with fig.batch_update():
            fig.data = [temp_df[variable].tolist()]
    sample_widget.observe(response, names = "value")
    sampling_widget.observe(response, names = "value")
    box = widgets.VBox([
        sample_widget, 
        sampling_widget, 
        fig
    ])
    display(box)

The problem I am running into, is that the dict object that I am passing, even when I convert it to a list (see this line):

        with fig.batch_update():
            fig.data = [temp_df[variable].tolist()]

It gives me this error:

ValueError: The data property of a figure may only be assigned 
a list or tuple that contains a permutation of a subset of itself.
    Received element value of type <class 'list'>

I am not sure if there is some way I can transform the objects I am passing to play nicer with plotly or if I am just missing something.


Solution

  • If I understand correctly, you want to update the graph plot, but the way that you update the figure is wrong. fig.data is an object of plotly.graph_objs._histogram.Histogram. To modify the histogram, you need to modify its x attribute

    Try

    fig.data[0].x = temp_df[variable].tolist()
    

    Also your code has a lot of typos. Try changing simple to sample, and sample_mans to sample_means. The indentation is also incorrect

    def response(change):
        if sample_widget.value == "20" and sampling_widget.value == "2":
            temp_df = dict.get("size_20_sample_1")
            temp_df = list(dict.items())
            temp_df = temp_df[temp_df[0]=='size_20_sample_1'][1]
        with fig.batch_update():
            fig.data = [temp_df[variable].tolist()]
    

    To (It is only valid if reference sample_widget.value == "20" and sampling_widget.value == "2")

    def response(change):
        if sample_widget.value == "20" and sampling_widget.value == "2":
            temp_df = dict.get("size_20_sample_1")
            temp_df = list(dict.items())
            temp_df = temp_df[temp_df[0]=='size_20_sample_1'][1]
            with fig.batch_update():
                fig.data[0].x = temp_df[variable].tolist()