pythonpandasnormalization

Reverse z Score pandas dataframe


I am using this to compute the z score of my dataframe:

df_z=df.apply(zscore)

Is there a reverse operation that can give me the orginal values?


Solution

  • There is no built-in way to go from df_z (z scores) back to df (original values). However, you can do it fairly easily as follows:

    Step 1: Keep track of the mean and standard deviations of all of the original variables. Perhaps like this:

    mean_std={}
    for var in df.columns:
        mean_std[var]=(df[var].mean(), df[var].std())
    

    Step 2: Convert back to z scores

    def reverse_zscore(pandas_series, mean, std):
        '''Mean and standard deviation should be of original variable before standardization'''
        yis=pandas_series*std+mean
        return yis
    
    original_mean, original_std = mean_std[var]
    original_var_series = reverse_zscore(df_z[var], original_mean, original_std)
    

    Alternatively, just store your original dataframe somewhere