I'm using the R tm package for text analysis on a facebook group, and find that the removewords function isn't working for me. I tried to combine the french stopwords with my own, but they are still appearing. So I create a file named "french.txt" with my own list as in the following command:
nom_fichier <- "Analyse textuelle/french.txt"
my_stop_words <- readLines(nom_fichier, encoding="UTF-8")
Here is the data for text mining:
text <- readLines(groupe_fb_ief, encoding="UTF-8")```
docs <- Corpus(VectorSource(text))
inspect(docs)
Here are the tm_map commands:
docs <- tm_map(docs, tolower)
docs <- tm_map(docs, stripWhitespace)
docs <- tm_map(docs, removePunctuation)
docs <- tm_map(docs, removeNumbers)
docs <- tm_map(docs, removeWords, my_stop_words)
Applying that, it's still not working and I don't understand why. I even try to change to order of the commands with no result.
Do you have any idea ? Is it possible to change the french stopwords within R ? Where this list is located ?
Thanks!!
Rather than use RemoveWords, I typically use an anti_join() to remove all stop words.
library(tidytext)
my_stop_words <- my_stop_words %>%
unnest_tokens(output = word, input = text, token = "words")
# anti_join
anti_join(docs,my_stop_words, by = "word")
That is if the the column that contains your corpus is called "word". Hope this helps.