I have this dataframe:
age
0 48
1 7
2 62
3 48
4 51
This code:
import pandas as pd
import numpy as np
def normalizar(x):
# Convert x to a numpy array to allow for vectorized operations.
x = np.array(x)
# Calculate the minimum and maximum values of x.
xmin = x.min()
xmax = x.max()
# Normalize the array x using vectorized operations.
return (x - xmin) / (xmax - xmin)
df["age_n"] = df["age"].apply(normalizar)
df
and I get:
age age_n
0 48 NaN
1 7 NaN
2 62 NaN
3 48 NaN
4 51 NaN
How can I solve this issue?
The expected result would be values between [0,1]
The problem appears with your .apply()
function which actually not needed as your function handles it. You just need to feed the whole column to it:
df["age_n"] = normalizar(df["age"])
OP asked how it could be done with .apply()
where you need to normalize all your columns (which of course is not really good approach), but in case sharing answer for it too.
def normalizar(x, xmin, xmax):
return (x - xmin) / (xmax - xmin)
# and find min/max values
xmin = df['age'].min()
xmax = df['age'].max()
# and apply normalization
df["age_n"] = df["age"].apply(lambda x: normalizar(x, xmin, xmax))