pandasdataframelambdarangenoise

Applying normal noise to column, if in range. Pandas / Python


I want to add noise to a column of values in the range of 0-1.

But the noise shouldn't exceed these ranges, so my thought process was to check if adding the noise would be outside of the range, if it did, don't add the noise.

I tried:

df['val_x'].apply(lambda x: (x + np.random.normal(0, 0.2)) if (0 < x + np.random.normal(0, 0.2) < 1) else x)

at first, but I'm assuming it creates two separate random values, so some of the values pass the check with one and apply to the data frame with the other.

I feel like I need something like:

df['val_x'].apply(lambda x, withNoise = x + np.random.normal(0, 0.2): withNoise if (0 < withNoise < 1) else x)

defining the argument beforehand, but lambda doesn't support defining arguments with other arguments.

I want to do this without creating another function, but if it is the only way, I can.

Thanks in advance.


Solution

  • What about clipping?

    df['val_x'] = df['val_x'].add(np.random.normal(0, 0.2, size=len(df))).clip(0, 1)
    

    Or, adding your noise and only update the valid values:

    s = df['val_x'].add(np.random.normal(0, 0.2, size=len(df)))
    df['val_x'] = s.where(s.between(0, 1), df['val_x'])
    
    # or
    df.loc[s.between(0, 1), 'val_x'] = s