pythondaskdask-dataframedask-ml

dask-ml preprocessing raise AttributeError


I use Dask dataframe and dask-ml to manipulate my data. When I use dask-ml Min-max scaler, I get this error. Is there a way to prevent this error and make it work?

import dask.dataframe as dd
from dask_ml.preprocessing import MinMaxScaler

df = dd.read_csv('path to csv', parse_dates=['CREATED_AT']
                     , dtype={'ODI_UPDATED_AT': 'object'})
scaler = MinMaxScaler()
print(scaler.fit_transform(df['M']))

AttributeError: 'Scalar' object has no attribute 'copy'


Solution

  • Since the error message is ambiguous, an issue was opened: Better error message when using invalid 'MinMAxScaler.fit()' inputs

    By the way, the way to solve this problem is using appropriate type as input. something like this:

    scaler = dask_ml.preprocessing.MinMaxScaler()
    col_1 = df['col_1'].values
    scaler.fit(col_1.compute().reshape(-1, 1))
    col_1 = dask_scaler.transform(col_1.compute().reshape(-1, 1))
    

    second line gives you dask array and col_1.compute().reshape(-1,1) gives you numpy array. Finally you can concatenate multiple transformed columns and get new df.

    ddf = dd.concat([dd.from_array(c) for c in [col_1, col_2, col_3]], axis=1)
    ddf.columns = ['col_name', 'col_name', 'col_name']