Please, suppose that we have an array like this:
import numpy as np
X_train = np.array([[ 1., -1., 2.],
[ 2., 0., 0.],
[ 0., 1., -1.]])
We scale it with .scale_
existed in sklearn
by this code:
from sklearn import preprocessing
scaler = preprocessing.StandardScaler().fit(X_train)
scaler.scale_
and such result was shown:
array([0.81649658, 0.81649658, 1.24721913])
Are you know how it was calculated?
If you know, please write its formula that how it is calculated?
I supposes that .scale_
shows Interquartile range (IQR)
, but when I calculate it manually IQR
is:
array([2, 2, 3]) rather than `array([0.81649658, 0.81649658, 1.24721913])`.
Also, I think array([0.81649658, 0.81649658, 1.24721913])
is a normal type of array([2, 2, 3])
, but I don't know how it was normalized.
Please, help me to find it.
Three main statistic measures of mean, variance, and Standard deviation are calculated with
mean = preprocessing.StandardScaler().fit(X_train).mean_
variance = preprocessing.StandardScaler().fit(X_train).var_
Standard_deviation = preprocessing.StandardScaler().fit(X_train).scale_
according to the question:
X_train = np.array([[ 1., -1., 2.],
[ 2., 0., 0.],
[ 0., 1., -1.]])
mean = preprocessing.StandardScaler().fit(X_train).mean_
print(mean)
array([1. , 0. , 0.33333333])
variance = preprocessing.StandardScaler().fit(X_train).var_
print(variance )
array([0.66666667, 0.66666667, 1.55555556])
Standard_deviation = preprocessing.StandardScaler().fit(X_train).scale_
print(Standard_deviation )
array([0.81649658, 0.81649658, 1.24721913])
in other words:
scaler.scale_ = np.sqrt(scaler.var_)