I'm trying to separate out free text fields until individual words/phrases, while also keeping their association with a group so I can stratify in my graphing later on
here is my original code. I'm trying to add in a "year" variable so I can stratify the different research interests by what year a student is in. I'd like to have a total n for each word, as well as n for each year
example of my data set:
Please.list.your.research.interests | Year |
---|---|
Vaccines, TB, HIV | 1st year |
TB, Chronic Diseases | 2nd year |
library(tidyverse)
library(tidytext)
data_research_words <- unlist(strsplit(data_research$Please.list.your.research.interests, ", "))
text_df <- tibble(line=1:97, data_research_words)
text_count <- text_df %>%
count(data_research_words, sort=TRUE)
Something like this?
library(tidyverse)
# split on commas, to create a separate row for each list element
df <- df |>
separate_longer_delim("Please.list.your.research.interests", ", ")
# then get the count for each research interest
df |> count(Please.list.your.research.interests)
# ...and the same, but separated also by years
df |> count(Year, Please.list.your.research.interests)
Output:
Please.list.your.research.interests n
1 Chronic Diseases 1
2 HIV 1
3 TB 2
4 Vaccines 1
Year Please.list.your.research.interests n
1 1st year HIV 1
2 1st year TB 1
3 1st year Vaccines 1
4 2nd year Chronic Diseases 1
5 2nd year TB 1