pythonpandasbooleandtype

Is pd.get_dummies() updated in newer versions of Pandas making it default to Booleans (True/False) instead of (0/1)?


I could've sworn df = pd.get_dummies(df, columns=categorical_cols) used to output binary values (0 and 1). It even says it when I hover over get_dummies.

enter image description here

Buy why my output Boolean (True/False)?

Here is my code:

import pandas as pd

# loading data
script_dir = os.path.dirname(__file__)
data_path = os.path.join(script_dir, "path/to/my/raw/data.csv")
df = pd.read_csv(data_path)

# Sample
feature_cols = ["list", "of", "feature", "cols"]
categorical_cols = ["list", "of", "categorical", "cols"]
target = "target_col"

X = df[feature_cols].copy()
y = df[target]

# Convert categorical columns to category type

# Apply get_dummies
X[categorical_cols] = X[categorical_cols].astype("category")
X = pd.get_dummies(X, columns=categorical_cols)

Expected Output: Dummy variables to be in binary form (0 and 1).

Actual Output: The output contains Booleans (True/False).

Steps taken:

I had to convert it to 0 and 1 by adding dtype=int.

`df = pd.get_dummies(df, columns=categorical_cols, dtype=int)`

My Questions:


Solution

    1. Why is pd.get_dummies() defaulting to Booleans (True/False) instead of (0/1)?

    I found it in whatsnew/v2.0.0:

    Default value of dtype in get_dummies() is changed to bool from uint8 (GH45848)

    More infrmation GH45848.

    1. Is updated in newer versions of Pandas, or am I doing something wrong?

    Exactly, new versions return True, False, old versions 1, 0

    1. Should I use dtype=int all the time?

    Exactly.