I could've sworn df = pd.get_dummies(df, columns=categorical_cols)
used to output binary values (0 and 1). It even says it when I hover over get_dummies
.
Buy why my output Boolean (True/False)?
Here is my code:
import pandas as pd
# loading data
script_dir = os.path.dirname(__file__)
data_path = os.path.join(script_dir, "path/to/my/raw/data.csv")
df = pd.read_csv(data_path)
# Sample
feature_cols = ["list", "of", "feature", "cols"]
categorical_cols = ["list", "of", "categorical", "cols"]
target = "target_col"
X = df[feature_cols].copy()
y = df[target]
# Convert categorical columns to category type
# Apply get_dummies
X[categorical_cols] = X[categorical_cols].astype("category")
X = pd.get_dummies(X, columns=categorical_cols)
Expected Output: Dummy variables to be in binary form (0 and 1).
Actual Output: The output contains Booleans (True/False).
Steps taken:
I had to convert it to 0 and 1 by adding dtype=int
.
`df = pd.get_dummies(df, columns=categorical_cols, dtype=int)`
My Questions:
dtype=int
all the time?I found it in whatsnew/v2.0.0
:
Default value of dtype in get_dummies() is changed to bool from uint8 (GH45848)
Exactly, new versions return True
, False
, old versions 1
, 0
Exactly.