rselecttitlearticlekeyword-search

How to write an R code for searching keywords?


My goal is to investigate: whether an article title or it's abstract contain keywords, and to select or mark theese articles somehow (creating a new category, like contain/not comtain any of a keyword list). I collected theese articles in an excel file, i have its titles, abstract there. What are the steps to consider in this situation? I have very little experience btw, just started to learn R, and I came from psychology field.

I haven't started yet, just wondering how it could be solved


Solution

  • Here is some sample data and code to get you started (with comments to guide you through the code):

    library(tidyverse)
    
    # define sample data
    df <- tibble(
      title = c("Programming in R", "How to post on StackOverflow", "A pasta recipe"),
      abstract = c("This is an article about reproducible resesarch.", "Have a question on coding? Learn how to ask anything.", "Ingredients: 2 eggs, a little flour.")
    )
    
    # define keyword
    keywords <- "programming|coding"
    
    # create indicator column that is TRUE if either title or abstract contain any keyword
    df |>
      mutate(about_programming = str_detect(title, regex(keywords, ignore_case = TRUE)) |
               str_detect(abstract, regex(keywords, ignore_case = TRUE)))
    #> # A tibble: 3 × 3
    #>   title                        abstract                        about_programming
    #>   <chr>                        <chr>                           <lgl>            
    #> 1 Programming in R             This is an article about repro… TRUE             
    #> 2 How to post on StackOverflow Have a question on coding? Lea… TRUE             
    #> 3 A pasta recipe               Ingredients: 2 eggs, a little … FALSE
    

    Created on 2023-04-15 with reprex v2.0.2