pythonpandasnumpy

Normalization in pandas via function


I have this dataframe:

    age
0   48
1   7
2   62
3   48
4   51

This code:

import pandas as pd
import numpy as np

def normalizar(x):
  # Convert x to a numpy array to allow for vectorized operations.
  x = np.array(x)
  # Calculate the minimum and maximum values of x.
  xmin = x.min()
  xmax = x.max()
  # Normalize the array x using vectorized operations.
  return (x - xmin) / (xmax - xmin)

df["age_n"] = df["age"].apply(normalizar)
df

and I get:

   age  age_n
0   48    NaN
1    7    NaN
2   62    NaN
3   48    NaN
4   51    NaN

How can I solve this issue?

The expected result would be values between [0,1]


Solution

  • The problem appears with your .apply() function which actually not needed as your function handles it. You just need to feed the whole column to it:

    df["age_n"] = normalizar(df["age"])
    

    OP asked how it could be done with .apply() where you need to normalize all your columns (which of course is not really good approach), but in case sharing answer for it too.

    def normalizar(x, xmin, xmax):
        return (x - xmin) / (xmax - xmin)
    
    # and find min/max values 
    xmin = df['age'].min()
    xmax = df['age'].max()
    
    # and apply normalization 
    df["age_n"] = df["age"].apply(lambda x: normalizar(x, xmin, xmax))