I am creating a Naive Bayes classifier in Python that will be able to guess which month it is based on some weather data of a single day.
Currently the mean and standard deviation are used to classify the month, however I figured that adding skewness and kurtosis might help in improving the accuracy.
I am currently using scipy.stats.norm.cdf to calculate the chance, but I cannot seem to find any cdf function in Python that takes skewness and kurtosis into account.
I feel like I might not be understanding skewness and kurtosis correctly. Skewness and kurtosis have an impact on the cdf function and therefore I expected them to be given as a parameter.
Is there something fundamentally wrong with my understanding of skewness, kurtosis and the cdf function? If not, then where can I find an implementation of the cdf function in Python that takes all these parameters into account?
Normal distribution, which you use (scipy.stats.norm) and which is typicaly used to model one-dimensional conditional distribution in Naive Bayes is explicitly defined by just two parameters - its mean
and std
. There is no point in specifing skewness/kurtosis as they are constant for your distribution (in particular kurtosis is 3).
What you are thinking about is probably a Pearson distribution, which is used to fit more moments (mean, std, skewness and kurtosis).
http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.pearson3.html