Somewhat of a beginner in R and I am working on a relatively large dataset (for me at least) of around 500,000 rows.
I am trying to find the correlation between variables for various countries (measuring the effects of bullying specifically) for the PISA dataset (education based survey).
I am able to compute the correlation matrix for countries on a case by case basis.
I wanted to record the correlation between two variables (so not the entire matrix necessarily) across all these countries - automating this and storing the results all in a tibble so that I don’t need to spend time doing this manually.
correl_countries = tibble()
for (each in list_countries){
countries_bullying %>% #tibble subset of the original data
filter(CNTRYID == each)%>%
select(reading_score, bullied_index)%>%
correl = cor(use = "pairwise.complete.obs") #something to store the correlation values
correl_countries %>% add_row(x = each, y = correl) #wanted to add these results to a tibble
}
Currently nothing seems to happen and I receive this error.
Error in is.data.frame(x) : argument "x" is missing, with no default
It may have something to do with the fact that "pairwise.complete.obs" generates a correlation matrix and not a single vector.
Grateful for your recommendations!
You don't really need the loop here, the tidyverse
has got you covered... The following returns a tibble with 2 columns: CNTRYID and correl:
library(tidyverse)
# get only the correlations
countries_bullying %>%
group_by(CNTRYID) %>%
summarise(correl = cor(reading_score, bullied_index, use = "pairwise.complete.obs"))