python-3.xpandasdataframenumpydata-science

When should you convert Age column in float or int?


I had this doubt, often datasets have the Age column values in either int or float datatype (Eg Titanic). So suppose the column has all float values, should you convert them all to int or let it be just like that while feeding it to ML Model, Does it have any harm or adverse effects in prediction results and what's the right way?


Solution

  • age is a continuous variable: every moment that passes you age, you don't age incrementally once a year, so the data type which most closely reflects reality is a float and not an integer.

    However using a float or an integer depends on the use case, eg:

    As a general remark you'll often find that there is no single "right way" of dealing with data, it all depends on the use case.