pythonpython-3.xpandascsvunsupervised-learning

Need help transforming 1 categorical column into separate categorical columns in Python


I have a large csv file named 'grocery store transactions' loaded into Python that contains only 2 columns: a) transaction # and b) products bought during each transaction.

I am trying to do some Association Rule Mining, but before I can do that, I have to transform the dataset below:

txns = pd.read_csv('transactions.csv')
txns.head(15)

enter image description here

*** My dilemma is that I am trying to transform the one "Product" column into its corresponding categorical columns with 1s and 0s ***

*** Does anyone know how to transform my "Product" column into separate binary categorical columns?

Below is an example of how this should look like but using a different dataset from class. This is a bank csv file that had a single 'bank features' column transformed into separate categorical binary columns. 1 represents the customer had the bank feature and 0 means they do not.

enter image description here

Any help would be greatly appreciated !!


Solution

  • You can use either pandas or scikit-learn.

    Pandas:

    pd.get_dummies(df.Product, prefix='Product')
    

    Scikit Learn:

    from sklearn.preprocessing import LabelBinarizer
    y = LabelBinarizer().fit_transform(df.Product)
    

    For multicolumn data you can also use scikit-learn's OneHotEncoder, but I won't give an example here since you don't need it.