I have been trying to match the output but I'm not getting the column names I got from df which I put into the statmodels.
import pandas
import statsmodels.api as statmodel
df = pandas.read_csv('fastfood.csv')
df = df[['total_fat', 'sat_fat', 'cholesterol', 'sodium','calories']]
X = df[['total_fat', 'sat_fat', 'cholesterol', 'sodium']].values
Y = df[['calories']].values
X = statmodel.add_constant(X)
model = statmodel.OLS(Y, X).fit()
print(model.mse_total.round(2))
print(model.rsquared.round(2))
print(model.params.round(2))
print(model.pvalues.round(2))
Output I got:
79770.18
0.9
[71.73 9.1 0.6 0.21 0.16]
[0. 0. 0.64 0.07 0. ]
Output I need:
79770.18
0.9
-{0,}71.73
total_fat 9.10
sat_fat . ..0.60
cholesterol 0.21
sodium... ...0.16
dtype: float64
{0,}0.00
total_fat 0.00
sat_fat. ..0.64
cholesterol...0.07
sodium .. ..0.00
dtype: float64
I tried to remove the .values
in the definitions of X and Y:
import pandas
import statsmodels.api as statmodel
df = pandas.DataFrame({'total_fat': np.random.rand(100),
'sat_fat': np.random.rand(100),
'cholesterol': np.random.rand(100),
'sodium': np.random.rand(100),
'calories': np.random.rand(100)})
df = df[['total_fat', 'sat_fat', 'cholesterol', 'sodium','calories']]
X = df[['total_fat', 'sat_fat', 'cholesterol', 'sodium']]
Y = df[['calories']]
X = statmodel.add_constant(X)
model = statmodel.OLS(Y, X).fit()
print(model.mse_total.round(2))
print(model.rsquared.round(2))
print(model.params.round(2))
print(model.pvalues.round(2))
and it gives me the following output:
0.09
0.02
const 0.49
total_fat -0.07
sat_fat 0.14
cholesterol 0.02
sodium -0.01
dtype: float64
const 0.00
total_fat 0.54
sat_fat 0.21
cholesterol 0.83
sodium 0.89
dtype: float64