I want to calculate the coefficient of correlation, by year, in R and put the results in a dataframe (then repeat the process by calculating the coefficient of determination). The following code returns a value which, I'm guessing, is for all years combined. The value appears in the console but not the dataframe.
xmasCount_Amt_Coef_Correlation <- xmasCount_Amt_df_ByCheckDate %>%
group_by(Year.x, YTD_Range.x)
cor(xmasCount_Amt_df_ByCheckDate$n, xmasCount_Amt_df_ByCheckDate$Amount)
A sample screenshot of my source table xmasCount_Amt_df_ByCheckDate is shown below. The complete table (dataframe) contains data for 2020-2022. The desired output table looks identical to the source table which is not what I want. I'm obviously missing a step or two but am clueless as to what. Any suggestions would be appreciated.
can you modify as needed, the below code in your project and let me know what you get:
library(dplyr)
# Group by year, then calculate corr coeff for each group
xmasCount_Amt_Coef_Correlation <- xmasCount_Amt_df_ByCheckDate %>%
group_by(Year.x) %>%
summarise(correlation = cor(n, Amount))
# and the result:
xmasCount_Amt_Coef_Correlation
One way you could do is (the coefficient of determination is denoted by R^2):
# add a new column with the coefficient of determination
xmasCount_Amt_Coef_Correlation <- xmasCount_Amt_Coef_Correlation %>%
mutate(determination = correlation^2)
# View the resulting data frame
xmasCount_Amt_Coef_Correlation