I found weird behavior of sklearn.preprocessing.MinMaxScaler and same for sklearn.preprocessing.RobustScaler When data max value is very small < 10^(-16) transformer doesn't change data max value from raw data max value. Why? df_small.dtypes is float64, this type can represent smaller numbers. How can I fix it without handcrafted: data = data / data.max()
df_small = pd.DataFrame((np.arange(5)*10.0**(-16)))
scaler_small = MinMaxScaler()
small_transformed = scaler.fit_transform(df_small)
print(small_transformed)
[[0.e+00]
[1.e-16]
[2.e-16]
[3.e-16]
[4.e-16]]
df_not_small = pd.DataFrame((np.arange(5)*10.0**(-15)))
scaler_not_small = MinMaxScaler()
not_small_transformed = scaler_not_small.fit_transform(df_not_small)
print(not_small_transformed)
[[0. ]
[0.25]
[0.5 ]
[0.75]
[1. ]]
When it's applying the scaling to use, the MinMaxScaler
calls the _handle_zeros_in_scale()
function, which has the check:
constant_mask = scale < 10 * np.finfo(scale.dtype).eps
For a dtype
that is np.float64
, the value of 10 * np.finfo(scale.dtype).eps
is 2.220446049250313e-15
, which is larger than your scale of 4e-16 in the second case (but smaller than the range 4e-15 in the first case). If the scale is smaller than this, it sets the scale factor to 1 (see this line):
scale[constant_mask] = 1.0
Unfortunately, you'll either have to manually scale the data yourself, or edit scikit-learn to change it to allow samples with smaller overall ranges.