i have the coefficients and the constant (alpha). i want to multiply and add the values together like this example. (it has to be done for 300000 rows)
Prediction = constant + (valOfRow1 * col1) + (-valOfRow1 * col2) + (-valOfRow1 * col3) + (valOfRow1 * col4) + (valOfRow1 * col5)
Prediction = 222 + (555-07 * col1) + (-555-07 * col2) + (-66* col3) + (55* col4) + (777* col5)
i have a one row dataframe which contains the coefficient and constant like this
col1 | col2 | col3 | col4 | col5 | constant | |
---|---|---|---|---|---|---|
2.447697e-07 | -5.214072e-07 | -0.000003 | 0.000006 | 555 | 222 |
and another dataframe with the exact same name but with monthly values.
col1 | col2 | col3 | col4 | col5 |
---|---|---|---|---|
16711 | 17961 | 0 | 20 | 55 |
i already tried to sort the columns and then i take the product of them df.dot
.
selected_columns = selected_columns.sort_index(axis=1)
#mean_coefficients dataframe 21th (starting from 0) is constant so i use the other columns
selected_columns['predicted_Mcap']=selected_columns.dot(mean_coefficients.iloc[:,0:20])+mean_coefficients['const']
the reason that i use mean_coefficients.iloc[:,0:20]
is because i don't want to conclude const
in the multiplication it just needs to be added at the end.
so i calculated the predicted value but when i checked it in excel the value wasn't the same.
am i calculating it right?
As mentions in df.dot()
documentation the column names of DataFrame and the index of other must contain the same values, as they will be aligned prior to the multiplication. Otherwise you'll get
ValueError: matrices are not aligned
so you have 2 Options:
to use the df.dot()
with the .T
or transposed dataframe. Your column names will be as indexes and is ready to be multiplied in a matrix way. Remember that the Column names in both dataframes has to be the same. Even one extra column returns error.
selected_columns['predicted_MCAP']=selected_columns.dot(mean_coefficients.iloc[:,1:21].T) + mean_coefficients['const']
in order to workaround this i by using numpy array
result = df1.dot(df2.values)