I had this doubt, often datasets have the Age column values in either int or float datatype (Eg Titanic). So suppose the column has all float values, should you convert them all to int or let it be just like that while feeding it to ML Model, Does it have any harm or adverse effects in prediction results and what's the right way?
age
is a continuous variable: every moment that passes you age, you don't age incrementally once a year, so the data type which most closely reflects reality is a float
and not an integer
.
However using a float
or an integer
depends on the use case, eg:
age
as a feature describing how old people are? Better use float (eg a person who is 59.9 is older than a person who is 59.1 and may be more likely to develop certain medical conditions, or maybe less physically fit and less likely to survive in an event of a sinking ship)age
groups? Might be better off rounding to nearest integer (eg 39.9 -> 40, 34.2 -> 34) and potentially binning (eg 25-34, 35-45)int
value (eg if legal age is 16 and a person is 15.9, legally they are 15 and therefore underage drinking)As a general remark you'll often find that there is no single "right way"
of dealing with data, it all depends on the use case.