rtext-analytics-api

Predicting continuous variable using text in R


I have a task wherein I need to predict a continuous variable, odometer reading based on text field that has the issues faced by customer. This field is not a drop down menu but is updated using customer's verbatim. So I need to predict odometer reading based on the text field that has problems faced by customers. For ex:

**Text**                     **Odometer Reading**
Clutch problem               20,000 
Axle Issue                   150,000

Edit:

I am building a linear model using unigram. But I get this warning when I am performing data pre-processing:

> corp <- Corpus(VectorSource(ISSUES$CUSTOMER_VOICE))
> 
> corp <- tm_map(corp,tolower)
Warning message:
In tm_map.SimpleCorpus(corp, tolower) : transformation drops documents
> corp <- tm_map(corp,removePunctuation)
Warning message:
In tm_map.SimpleCorpus(corp, removePunctuation) :
transformation drops documents
> corp <- tm_map(corp,removeWords,stopwords('english'))
Warning message:
In tm_map.SimpleCorpus(corp, removeWords, stopwords("english")) :
transformation drops documents
> corp <- tm_map(corp,stemDocument)
Warning message:
In tm_map.SimpleCorpus(corp, stemDocument) : transformation drops documents

Could someone please tell me how to fix this warning.


Solution

  • It is just one way to do But this may not be a optimal solution for Text column do textminig to get unigrams and bigrams and then convert them to DTM matrix and then use any Linear model to predict the Odometer Reading

    I hope this may solve your issue