Description:
Input is a CSV file
CSV file contains columns of different data types: Ordinal Values, Nominal Values, Numerical Values and Multi Value
For the multivalue columns. Minimum is 1, maximum is 5 values. The input is similar to this:
Job Perks | Insurance Benefits |
---|---|
Online Courses; Certification Programs; Cross Training | Life Insurance; Dental Insurance |
Leadership Development Programs; Online Courses | Life Insurance; Accident Insurance |
Job Perks_Online Courses | Job Perks_Certification Programs | Job Perks_Cross Training | Job Perks_Leadership Development Programs | Insurance Benefits_Life Insurance | Insurance Benefits_Dental Insurance | Insurance Benefits_Accident Insurance |
---|---|---|---|---|---|---|
1 | 1 | 1 | 0 | 1 | 1 | 0 |
1 | 0 | 0 | 1 | 1 | 0 | 1 |
Input:
CSV Dataset: https://github.com/omnislayer/WorkDataSet/blob/main/ECP_Unedited.csv
Sample Code:
#For the nominal:
import pandas as pd
import numpy as np
import dtale #Better STDOUT for dataframes
nominalColumns = ["Gender", "Marital Status", "Educational Attainment", "Employment Status", "Company Bonus Structure", "Company Medical Plan Type"]
multivalueColumns = ["Job Perks", "Professional Development Opportunities", "Insurance Benefits"]
df = pd.read_csv('ECP_Unedited.csv')
#Convert Nominal Columns
newCols = pd.get_dummies(df[nominalColumns], dtype=int)
df = df.drop(columns=nominalColumns)
df = pd.concat([df, newCols], axis=1)
dtale.show(df)
#Convert Multivalue Columns
#INSERT CODE HERE!
You could combine str.get_dummies
, add_prefix
, and pd.concat
with a generator:
out = pd.concat(
(
df[col].str.get_dummies(sep='; ').add_prefix(f'{col}_')
for col in multivalueColumns
),
axis=1,
)
Output:
Job Perks_Certification Programs Job Perks_Cross Training Job Perks_Leadership Development Programs Job Perks_Online Courses Insurance Benefits_Accident Insurance Insurance Benefits_Dental Insurance Insurance Benefits_Life Insurance
0 1 1 0 1 0 1 1
1 0 0 1 1 1 0 1