I have a task wherein I need to predict a continuous variable, odometer reading based on text field that has the issues faced by customer. This field is not a drop down menu but is updated using customer's verbatim. So I need to predict odometer reading based on the text field that has problems faced by customers. For ex:
**Text** **Odometer Reading**
Clutch problem 20,000
Axle Issue 150,000
Edit:
I am building a linear model using unigram. But I get this warning when I am performing data pre-processing:
> corp <- Corpus(VectorSource(ISSUES$CUSTOMER_VOICE))
>
> corp <- tm_map(corp,tolower)
Warning message:
In tm_map.SimpleCorpus(corp, tolower) : transformation drops documents
> corp <- tm_map(corp,removePunctuation)
Warning message:
In tm_map.SimpleCorpus(corp, removePunctuation) :
transformation drops documents
> corp <- tm_map(corp,removeWords,stopwords('english'))
Warning message:
In tm_map.SimpleCorpus(corp, removeWords, stopwords("english")) :
transformation drops documents
> corp <- tm_map(corp,stemDocument)
Warning message:
In tm_map.SimpleCorpus(corp, stemDocument) : transformation drops documents
Could someone please tell me how to fix this warning.
It is just one way to do But this may not be a optimal solution for Text column do textminig to get unigrams and bigrams and then convert them to DTM matrix and then use any Linear model to predict the Odometer Reading
I hope this may solve your issue