machine-learninganalyticsdata-analysispredictive

Are some data sets just not predictive?


Are some types of data sets just not predictive?

A current real life example for myself: My goal is to create a predictive model for cross selling insurance products. E.g. Car Insurance to Health Insurance.

My data set consists mainly of characteristic data such as what state they live in, age, gender, type of car etc...

I've tried various different models such as XGboosted Trees to regularised logistic regressions and AUC cannot get above .65.


So that leads me to - are some types of data sets just not predictive? How do you help stakeholders understand this?


Solution

  • Some datasets may not be very predictive. Esspecially if you're lacking variables that accounts for much of the variance. It's hard to say if that is the case without talking to subject matter experts. With that said though, models are good and fine but I would also ensure that you're spending significant amount of time engineering features. Often time representing data the right way can be the difference between a working model and a bad model, especially in tree models.