arrayspython-3.xpandasnumpymachine-learning

how to convert sparse numpy array to Dataframe?


below is the code snippet,

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder',OneHotEncoder(),[2,3,4])],remainder='passthrough')
X = np.array(ct.fit_transform(x_data))
X.shape

i get output like below for shape

()

when i try to print X , I get output like below

array(<8820x35 sparse matrix of type '<class 'numpy.float64'>'
    with 41527 stored elements in Compressed Sparse Row format>, dtype=object)

now when i try to convert this array to dataframe

X = pd.DataFrame(X)

i get below error

ValueError: Must pass 2-d input

how do i convert my numpy array to dataframe?


Solution

  • Looks like

    ct.fit_transform(x_data)
    

    produces a sparse matrix.

    np.array(...)
    

    just wraps that in a object dtype array.

    array(<8820x35 sparse matrix of type '<class 'numpy.float64'>'
        with 41527 stored elements in Compressed Sparse Row format>, dtype=object)
    

    Use toarray or A to convert it properly to a numpy array:

    X = ct.fit_transform(x_data).A