I Have a dataframe which has this structure:
Note.Reco Review Review.clean.lower
10 Good Products good products
9 Nice film nice film
.... ....
The first column is the rank of the film, then the second column is the customer's review then the 3rd column is the review with lowercase letters.
I try now to delete stop words with this:
Data_clean$Raison.Reco.clean1 <- Corpus(VectorSource(Data_clean$Review.clean.lower))
Data_clean$Review.clean.lower1 <- tm_map(Data_clean$Review.clean.lower1, removeWords, stopwords("english"))
But R studio crashes
Can you help me to resolve this problem please?
Thank you
EDIT:
#clean up
# remove grammar/punctuation
Data_clean$Review.clean.lower <- tolower(gsub('[[:punct:]0-9]', ' ', Data_clean$Review))
Data_corpus <- Corpus(VectorSource(Data_clean$Review.clean.lower))
Data_clean <- tm_map(Data_corpus, removeWords, stopwords("french"))
train <- Data_clean[train.index, ]
test <- Data_clean[test.index, ]
So I get error when I run the 2 last instructions.
Try the below . You can do cleaning on the corpus and not column directly.
Data_corpus <-
Corpus(VectorSource(Data_clean$Review.clean.lower))
Data_clean <- tm_map(Data_corpus, removeWords, stopwords("english"))
EDIT: As mentioned by you, you want to be able to access the output after removing stop words, try the below instead of the above:
library(tm)
stopWords <- stopwords("en")
Data_clean$Review.clean.lower<- as.character(Data_clean$Review.clean.lower)
'%nin%' <- Negate('%in%')
Data_clean$Review.clean.lower1<-lapply(Data_clean$Review.clean.lower, function(x) {
chk <- unlist(strsplit(x," "))
p <- chk[chk %nin% stopWords]
paste(p,collapse = " ")
})
Sample Output of above code:
> print(Data_clean)
> note Note.Reco.Review Review.clean.lower Review.clean.lower1
> 1 10 Good Products good products good products
> 2 9 Nice film is a nice film nice film
Also check the below: R remove stopwords from a character vector using %in%