rnlpudpipe

R - Parsing keywords from udpipe RAKE per article back to dataframe


I'm attempting to use udpipe's RAKE to generate a list of 25 RAKE tokens per document in a dataframe and write those tokens (plus a simple str_count) back to the dataframe. I constructed a for loop to handle, but instead I'm writing the same result to every line, instead of different results to each line.

Packages installed and used are udpipe, dplyr, stringi, stringr, data.table.

annotation$length <- nchar(annotation$token)

annotation <- annotation %>% filter(length >= 3 )

counter <- textdf$doc_id

for (i in counter) {
  subannotation <- annotation %>% filter(doc_id == i)
  stats <-
    keywords_rake(
      x = subannotation,
      term = "token", #token or lemma
      group = "doc_id",
      ngram_max = 3,
      n_min = 1,
      relevant = subannotation$upos %in% c("NOUN", "VERB", "ADV", "ADJ")
    )
  stats <- stats %>% top_n(25,rake)
  checktopics <- paste(stats$keyword, collapse =  " ")
  textdf$topics <- checktopics
  textdf$score <- str_count(checktopics,"cheese")

}

The intended outcome should be something like:

id score topics
1  12    chocolate chocoholics cheese
2  1     plastic waste cheese
3  3     neuroscientists data system

The current outcome is:

id score topics
1  3     neuroscientists data system
2  3     neuroscientists data system
3  3     neuroscientists data system

What am I doing wrong?

Thank you!


Solution

  • The appropriate fix is to add the pointer to the line in the loop. Derp.

    textdf$topics[i] <- checktopics
    textdf$score[i] <- str_count(checktopics,"cheese")