[SOLVED] Transforming dataframe to sparse matrix and reset index

Transforming dataframe to sparse matrix and reset index

I have a data set with the rating of user ID to all product ID. There are only 5000 products and 10,000 users but the ID is in different number. I would like to transform my dataframe to a coo_sparse_matrix(data, (row,col), shape) but with row and col as the real number of products and users, not the ID. Is there any way to do that? Below is the illustration:

Data frame:

User ID	Product ID	Rating
1	14	0.1
1	15	0.2
2	14	0.3
2	16	0.3
5	19	0.4

and expected to have a matrix (in sparse coo form)

ProductID	14	15	16	19
UserID
1	0.1	0.2	0	0
2	0.3	0	0.3	0
5	0	0	0	0.4

because normally the sparse_coo would give a very large matrix with index (1,2,...,19) for product ID and (1,2,3,4,5) for user ID.

This is for my thesis.

Solution

Hi hope this helps and good luck with your thesis:

import pandas as pd
from scipy.sparse import coo_matrix

dataframe=pd.DataFrame(data={'User ID':[1,1,2,2,5], 'Product ID':[14,15,14,16,19], 'Rating':[0.1,0.2,0.3,0.3,0.4]})

row=dataframe['User ID']
col=dataframe['Product ID']
data=dataframe['Rating']

coo=coo_matrix((data, (row, col))).toarray()
new_dataframe=pd.DataFrame(coo)

#Drop non existing Product IDs --optional delet if not intended
new_dataframe=new_dataframe.loc[:, (new_dataframe != new_dataframe.iloc[0]).any()] 

#Drop non existing User IDs --optional delet if not intended
new_dataframe=new_dataframe.loc[(new_dataframe!=0).any(axis=1)]

print(new_dataframe)

Output:

    14   15   16   19
1  0.1  0.2  0.0  0.0
2  0.3  0.0  0.3  0.0
5  0.0  0.0  0.0  0.4