[SOLVED] Python drop random numbers of a beta distribution

Python drop random numbers of a beta distribution

I have a question about beta distributions and random variables. My data includes performance data from 2012 to 2016 on an hourly basis. I recalculated the data monthly, so I have only one value for every month. After that, I created a new df with all the values of a month as shown in my code sample.

import numpy as np
import pandas as pd
from scipy.stats import beta
import matplotlib.pyplot as plt

output = pd.read_csv("./data/external/power_output_hourly.csv", delimiter=",", parse_dates=True, index_col=[0])
print(output.head())

output_month = output.resample('1M').sum()
print(output_month.head())

jan = output_month[:1]
jan = jan.append(output_month[12:13])
jan = jan.append(output_month[24:25])
jan = jan.append(output_month[36:37])
jan = jan.append(output_month[48:49])
print(jan)

...

months = [jan, feb, mar, apr, mai, jun, jul, aug, sep, okt, nov, dez]

My next step is to pull random numbers from a beta distribution based on the past values of each month. Therefor, I wanna use the scipypackage and numpy.random. The problem is, that I don't know how...I need only 20 numbers, but I don't know, how I can determine the a and b value. Do I just have to try random values or can I extract the corresponding values from my past data? I am thankful for every help!

Solution

Try fit (=find the parameters) the beta distribution for each month using scipy.stats.beta.fit(MONTH). See here for short description of its outputs, or read into source code for details (poorly documented, unfortunately).

FYI More discussion about fitting beta distribution found in this post, for I haven't used the function a lot myself.