I'm trying to create a pairplot of my dataset, where the variables are vastly different numbers (some are in the 0-1 range, some, like age and Monthly Income, can go way higher) and I want to scale those variables that go above 1 to 0-1 using the following code:
scale_vars=['MonthlyIncome','age','NumberOfTime30-59DaysPastDueNotWorse','DebtRatio','NumberOfOpenCreditLinesAndLoans',
'NumberOfTimes90DaysLate','NumberRealEstateLoansOrLines','NumberOfTime60-89DaysPastDueNotWorse',
'NumberOfDependents']
scaler=MinMaxScaler(copy=False)
train2[scale_vars]=scaler.fit_transform(train2[scale_vars])
My problem is that after scaling the variables and creating the pairplot again, it doesn't change at all. Do you know what might be the cause for this? Here's the code I use to create a pairplot:
g=sns.pairplot(train2, hue='SeriousDlqin2yrs', diag_kws={'bw':0.2})
where SeriousDlqin2yrs is the Y variable.
The plots are expected to look the same, but not exactly - the tick labels should be different. The scaler does a linear transformation, and seaborn chooses the axis limits based on the range of values, so the arrangement of points in the scatter plots does not change.
Since I do not have your data, here is the same effect with Ronald Fisher's classic iris dataset:
import pandas as pd
import seaborn as sns; sns.set()
from sklearn.datasets import load_iris
from sklearn.preprocessing import MinMaxScaler
iris_dict = load_iris(as_frame=True)
iris = iris_dict['data']
iris['species'] = iris_dict['target']
g = sns.pairplot(iris, hue='species', diag_kws={'bw_method':0.2})
scale_vars = ['sepal length (cm)', 'sepal width (cm)',
'petal length (cm)', 'petal width (cm)']
scaler = MinMaxScaler(copy=False)
iris[scale_vars] = scaler.fit_transform(iris[scale_vars])
g = sns.pairplot(iris, hue='species', diag_kws={'bw_method':0.2})
Note that the column names should have been changed when the scaling was done, because these are no longer centimeters.