My goal is to investigate: whether an article title or it's abstract contain keywords, and to select or mark theese articles somehow (creating a new category, like contain/not comtain any of a keyword list). I collected theese articles in an excel file, i have its titles, abstract there. What are the steps to consider in this situation? I have very little experience btw, just started to learn R, and I came from psychology field.
I haven't started yet, just wondering how it could be solved
Here is some sample data and code to get you started (with comments to guide you through the code):
library(tidyverse)
# define sample data
df <- tibble(
title = c("Programming in R", "How to post on StackOverflow", "A pasta recipe"),
abstract = c("This is an article about reproducible resesarch.", "Have a question on coding? Learn how to ask anything.", "Ingredients: 2 eggs, a little flour.")
)
# define keyword
keywords <- "programming|coding"
# create indicator column that is TRUE if either title or abstract contain any keyword
df |>
mutate(about_programming = str_detect(title, regex(keywords, ignore_case = TRUE)) |
str_detect(abstract, regex(keywords, ignore_case = TRUE)))
#> # A tibble: 3 × 3
#> title abstract about_programming
#> <chr> <chr> <lgl>
#> 1 Programming in R This is an article about repro… TRUE
#> 2 How to post on StackOverflow Have a question on coding? Lea… TRUE
#> 3 A pasta recipe Ingredients: 2 eggs, a little … FALSE
Created on 2023-04-15 with reprex v2.0.2