pythonpandasnormalizexrange

Normalize columns in pandas data frame while once column is in a specific range


I have a data frame in pandas which contains my Experimental data. It looks like this:

KE  BE  EXP_DATA  COL_1  COL_2  COL_3 ...
10  1   5         1      2      3   
9   2   .         .      .      .
8   3   .         .
7   4
6   5
.
.   

The column KE is not used. BE are the Values for the x-axis and all other columns are y-axis values. For normalization I use the idea which is also presented here Normalize in the post of Michael Aquilina. There fore I need to find the maximum and the minimum of my Data. I do it like this

    minBE = self.data[EXP_DATA].min()
    maxBE = self.data[EXP_DATA].max()

Now I want to find the maximum and minimum value of this column but only for the Range in the "column" EXP_DATA when the "column" BE is in a certain range. So in essence I want to normalize the data only in a certain X-Range.

Solution

Thanks to the solution Milo gave me I now use this function:

def normalize(self, BE="Exp",NRANGE=False):
    """
    Normalize data by dividing all components by the max value of the data.

    """
    if BE not in self.data.columns:
        raise NameError("'{}' is not an existing column. ".format(BE) +
                        "Try list_columns()")
    if NRANGE and len(NRANGE)==2:
        upper_be = max(NRANGE)
        lower_be = min(NRANGE)
        minBE = self.data[BE][(self.data.index > lower_be) & (self.data.index < upper_be)].min()
        maxBE = self.data[BE][(self.data.index > lower_be) & (self.data.index < upper_be)].max()
        for col in self.data.columns:                                                           # this is done so the data in NRANGE is realy scalled between [0,1]
            msk = (self.data[col].index < max(NRANGE)) & (self.data[col].index > min(NRANGE))
            self.data[col]=self.data[col][msk]
    else:
    
        minBE = self.data[BE].min()
        maxBE = self.data[BE].max()

    for col in self.data.columns:
        self.data[col] = (self.data[col] - minBE) / (maxBE - minBE)

If I call the function with the parameter NRANGE=[a,b] and a and b are also the x limits of my plot it automatically scales the visible Y-values between 0 and 1 as the rest of the data is masked. IF the function is called without the NRANGE parameter the whole Range of the data passed to the function is scaled from 0 o 1.

Thank you for your help!


Solution

  • You can use boolean indexing. For example to select max and min values in column EXP_DATA where BE is larger than 2 and less than 5:

    lower_be = 2
    upper_be = 5
    
    max_in_range = self.data['EXP_DATA'][(self.data['BE'] > lower_be) & (self.data['BE'] < upper_be)].max()
    min_in_range = self.data['EXP_DATA'][(self.data['BE'] > lower_be) & (self.data['BE'] < upper_be)].min()