I use Dask dataframe and dask-ml to manipulate my data. When I use dask-ml Min-max scaler, I get this error. Is there a way to prevent this error and make it work?
import dask.dataframe as dd
from dask_ml.preprocessing import MinMaxScaler
df = dd.read_csv('path to csv', parse_dates=['CREATED_AT']
, dtype={'ODI_UPDATED_AT': 'object'})
scaler = MinMaxScaler()
print(scaler.fit_transform(df['M']))
AttributeError: 'Scalar' object has no attribute 'copy'
Since the error message is ambiguous, an issue was opened: Better error message when using invalid 'MinMAxScaler.fit()' inputs
By the way, the way to solve this problem is using appropriate type as input. something like this:
scaler = dask_ml.preprocessing.MinMaxScaler()
col_1 = df['col_1'].values
scaler.fit(col_1.compute().reshape(-1, 1))
col_1 = dask_scaler.transform(col_1.compute().reshape(-1, 1))
second line gives you dask array and col_1.compute().reshape(-1,1)
gives you numpy array. Finally you can concatenate multiple transformed columns and get new df.
ddf = dd.concat([dd.from_array(c) for c in [col_1, col_2, col_3]], axis=1)
ddf.columns = ['col_name', 'col_name', 'col_name']