rtexttmextractmining

Text mining in R, reading every row for a yes/no answer


I've been trying to figure out a way of using R on how to extract from a CSV file that was created using the RISmed package from PubMed certain terms, for example latino in a way that would create a new variable "Latino" read the whole row and insert if there is any mention of the word yes or no in the newly created variable

how would I be able to do this and which package do you recommend?

Here is a sample of my code

library(RISmed)
library(dplyr) # tibble and other functions

RCT_topic <- 'randomized clinical trial'
RCT_query <- EUtilsSummary(RCT_topic, mindate=2016, maxdate=2017, retmax=100)
summary(RCT_query)
RCT_records <- EUtilsGet(RCT_query)
RCT_data <- data_frame('PMID'=PMID(RCT_records),
                       'Title'=ArticleTitle(RCT_records),
                       'Abstract'=AbstractText(RCT_records),
                       'YearPublished'=YearPubmed(RCT_records),
                       'Month.Published'=MonthPubmed(RCT_records),
                       'Country'= Country(RCT_records),
                       'Grant' =GrantID(RCT_records),
                       'Acronym' =Acronym(RCT_records),
                       'Agency' =Agency(RCT_records),
                       'Mesh'=Mesh(RCT_records))

Solution

  • Why not use grepl to add a column indicating whether or not a search term is found in the abstract column of your search results? grepl will return a logical vector indicating TRUE if your pattern is found, or FALSE if is not.

    # There are no mentions of "Latino" or "latino" in your df. 
    RCT_data$Latino <- grepl("Latino|latino",RCT_data$Abstract)
    
    # There are several mentions of the word "pain":
    RCT_data$Pain <- grepl("pain",RCT_data$Abstract)