rnlpstop-words

keeping certain stopwords when natural language processing in R


I'm doing natural language processing in R with the code below. I noticed the line that remove stopwords, removes the word 'no'. Can I have it keep that word? Is there a way to view all the words it removes?

# Pre-processing chain
corpus <- tm_map(corpus, tolower)
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
cleanset <- tm_map(corpus, removeWords, stopwords('english')) # do not remove the word 'no'
cleanset <- tm_map(cleanset, stemDocument)
cleanset <- tm_map(cleanset, stripWhitespace)
inspect(cleanset[1:25])

Solution

  • stopwords simply returns a character vector. Just remove "no" from this vector being passed to removeWords

    tm_map(corpus, removeWords, setdiff(stopwords('english'), "no"))