pythonpandasdataframenormalize

Normalize columns of a dataframe


I have a dataframe in pandas where each column has different value range. For example:

df:

A     B   C
1000  10  0.5
765   5   0.35
800   7   0.09

Any idea how I can normalize the columns of this dataframe where each value is between 0 and 1?

My desired output is:

A     B    C
1     1    1
0.765 0.5  0.7
0.8   0.7  0.18(which is 0.09/0.5)

Solution

  • You can use the package sklearn and its associated preprocessing utilities to normalize the data.

    import pandas as pd
    from sklearn import preprocessing
    
    x = df.values #returns a numpy array
    min_max_scaler = preprocessing.MinMaxScaler()
    x_scaled = min_max_scaler.fit_transform(x)
    df = pd.DataFrame(x_scaled)
    

    For more information look at the scikit-learn documentation on preprocessing data: scaling features to a range.