pythonnumpypandasscikit-learndata-scrubbing

What is python's equivalent of R's NA?


What is python's equivalent of R's NA?

To be more specific: R has NaN, NA, NULL, Inf and -Inf. NA is generally used when there is missing data. What is python's equivalent?

How libraries such as numpy and pandas handle missing values?

How does scikit-learn handle missing values?

Is it different for python 2.7 and python 3?


Solution

  • Scikit-learn doesn't handle missing values currently. For most machine learning algorithms, it is unclear how to handle missing values, and so we rely on the user of handling them prior to giving them to the algorithm. Numpy doesn't have a "missing" value. Pandas uses NaN, but inside numeric algorithms that might lead to confusion. It is possible to use masked arrays, but we don't do that in scikit-learn (yet).