pythonpandasdataframecsv

How to preprocess multivalue attributes in a dataframe?


Description:

  1. Input is a CSV file

  2. CSV file contains columns of different data types: Ordinal Values, Nominal Values, Numerical Values and Multi Value

  3. For the multivalue columns. Minimum is 1, maximum is 5 values. The input is similar to this:

Job Perks Insurance Benefits
Online Courses; Certification Programs; Cross Training Life Insurance; Dental Insurance
Leadership Development Programs; Online Courses Life Insurance; Accident Insurance
  1. Multivalue Expected Output:
Job Perks_Online Courses Job Perks_Certification Programs Job Perks_Cross Training Job Perks_Leadership Development Programs Insurance Benefits_Life Insurance Insurance Benefits_Dental Insurance Insurance Benefits_Accident Insurance
1 1 1 0 1 1 0
1 0 0 1 1 0 1
  1. How do I preprocess the CSV input and save it to a dataframe with the above expected output? I am able to preprocess nominal attributes to the expected output(sample code below), but finding it hard to convert multivalues

Input:

CSV Dataset: https://github.com/omnislayer/WorkDataSet/blob/main/ECP_Unedited.csv

Sample Code:

#For the nominal:
import pandas as pd
import numpy as np
import dtale #Better STDOUT for dataframes

nominalColumns = ["Gender", "Marital Status", "Educational Attainment", "Employment Status", "Company Bonus Structure", "Company Medical Plan Type"]
multivalueColumns = ["Job Perks", "Professional Development Opportunities", "Insurance Benefits"]

df = pd.read_csv('ECP_Unedited.csv')

#Convert Nominal Columns
newCols = pd.get_dummies(df[nominalColumns], dtype=int)
df = df.drop(columns=nominalColumns)
df = pd.concat([df, newCols], axis=1)
dtale.show(df)

#Convert Multivalue Columns
#INSERT CODE HERE!


Solution

  • You could combine str.get_dummies, add_prefix, and pd.concat with a generator:

    out = pd.concat(
        (
            df[col].str.get_dummies(sep='; ').add_prefix(f'{col}_')
            for col in multivalueColumns
        ),
        axis=1,
    )
    

    Output:

       Job Perks_Certification Programs  Job Perks_Cross Training  Job Perks_Leadership Development Programs  Job Perks_Online Courses  Insurance Benefits_Accident Insurance  Insurance Benefits_Dental Insurance  Insurance Benefits_Life Insurance
    0                                 1                         1                                          0                         1                                      0                                    1                                  1
    1                                 0                         0                                          1                         1                                      1                                    0                                  1