rstandardization

Standardizing a vector in R so that values shift towards boundaries


I have vector as follows -

a <- c(0.211, 0.028, 0.321, 0.072, -0.606, -0.364, -0.066, 0.172, 
-0.917, 0.062, 0.117, -0.136, -0.296, 0.022, 0.046, -0.19, 0.057, 
-0.625, -0.01, 0.158, 0.407, -0.328, -0.347, -0.512, -0.101, 
0.008, -0.406, -0.014, 0.517, 0.085, -0.525, -0.635, -0.603, 
-0.105, 0.643, -0.094, -0.26, 0.348, -0.106, 0.608, 0.146, -0.343, 
-0.537, -0.661, 0.166, -0.037, -0.224, -0.269, -0.221, -0.623, 
-0.025, 0.382, 0.201, -0.281, -0.699, -0.373, -0.146, -0.273, 
-0.354, -0.138, -0.098, 0.312, 0.467, 0.156, 0.264, -0.108, -0.707, 
-1, -0.423, -0.708, -0.235, -0.219, -0.645, 0.081, 0.704, -0.639, 
0.368, -0.578, 0.158, -0.04, -0.071, -0.125, 0.006, 0.423, 0.112, 
1, 0.373, -0.554, -0.092, 0.509, -0.535, -0.619, -0.31, -0.082, 
-0.367, -0.574, 0.029, 0.391, 0.062, -0.476)

The range of this vector is from -1 to 1 and it looks like -

> plot(a)

enter image description here Is there a way to standardize vector a so that all the values move away from zero and shift towards 1 or -1? (near the red lines).

It will be great if I can control the extent of how much these values can move towards 1 or -1.


Solution

  • You can use min-max standardization. Usually min max std. is used to scale values between 0 and 1. However, you can scale values to any range [a, b] by using the following equation:

    X_Scaled = a + (x - min(x)) * (b-a) / (max(x) - min(x))
    

    So in your case, let's break it down to two steps.

    First: you want positive values to be centered around 0.75 and negative values centered around -0.75. So we can just filter for the values in your data.

    data <- runif(100, -1, 1)
    
    positive_vals <- data[data > 0]
    negative_vals <- data[data < 0]
    

    Second step: You want to control how much they move towards this value of 0.75. So you could define a range and a center. Say, a range of 0.05 and a center of 0.75 gives us a = 0.7 and b=0.8, right? We can do the same for the negative center.

    range <- 0.05
    upper_center <- 0.75
    lower_center <- -0.75
    
    b1 <- upper_center + range
    a1 <- upper_center - range
    
    b2 <- lower_center + range
    a2 <- lower_center - range
    

    Finally, we apply the min-max equation for both cases, taking care to preserve the original positions of the positive and negative values in the original array.

    # normalize them using, say, min-max
    positive_vals <- a1 + ((positive_vals - min(positive_vals)) * (b1 - a1)) / (max(positive_vals) - min(positive_vals))
    negative_vals <- a2 + ((negative_vals - min(negative_vals)) * (b2 - a2)) / (max(negative_vals) - min(negative_vals))
    
    new_data <- data
    new_data[data > 0] <- positive_vals
    new_data[data < 0] <- negative_vals
    
    # Plot the results!
    plot(data)
    points(new_data, col = "red")
    

    If you're not satisfied with moving values so close to 0.75, just increase the range. You can also move the centers by defining different values.

    Using your data provided: Values in red are the new data