I have a question about beta distributions and random variables. My data includes performance data from 2012 to 2016 on an hourly basis. I recalculated the data monthly, so I have only one value for every month. After that, I created a new df with all the values of a month as shown in my code sample.
import numpy as np
import pandas as pd
from scipy.stats import beta
import matplotlib.pyplot as plt
output = pd.read_csv("./data/external/power_output_hourly.csv", delimiter=",", parse_dates=True, index_col=[0])
print(output.head())
output_month = output.resample('1M').sum()
print(output_month.head())
jan = output_month[:1]
jan = jan.append(output_month[12:13])
jan = jan.append(output_month[24:25])
jan = jan.append(output_month[36:37])
jan = jan.append(output_month[48:49])
print(jan)
...
months = [jan, feb, mar, apr, mai, jun, jul, aug, sep, okt, nov, dez]
My next step is to pull random numbers from a beta distribution based on the past values of each month. Therefor, I wanna use the scipy
package and numpy.random
. The problem is, that I don't know how...I need only 20 numbers, but I don't know, how I can determine the a
and b
value. Do I just have to try random values or can I extract the corresponding values from my past data? I am thankful for every help!
Try fit (=find the parameters) the beta distribution for each month using scipy.stats.beta.fit(MONTH)
. See here for short description of its outputs, or read into source code for details (poorly documented, unfortunately).
FYI More discussion about fitting beta distribution found in this post, for I haven't used the function a lot myself.