rnormalization

How to normalise a dataset unevenly


So I have a dataset of life expectancy values between 30 and 100. I want to normalise them between 0-1, but I want to do it unevenly?

Basically I have four defined interval/breakpoint values creating 5 classes:

Breakpoints Age Normalised value
Min 30 0
BP1 57 0.2
BP2 62 0.4
BP3 66 0.6
BP4 71 0.8
Max 100 1

I can easily reclassify them into those five classes, but I don't know to calculate the normalised values using those breakpoints. All five classes have a range of 0.2, but the age range in each class will be different, e.g. class one, 0-0.2, has an age range of 27 years, but category two, 0.2-0.4, has a range of just 5 years.

Example data:

ages <- floor(runif(50, min = 30, max = 100))

Edit:

So on this graph, x is life expectancy in years, and y is the normalised values - I want to calculate the exact value of y for each x value. LE to Normalised Values graph


Solution

  • One standard approach is to use logistic regression:

    glmfit <- glm(Normalised_value~Age,data=df,family = binomial())
    plot(glmfit)
    
    predict(glmfit,newdata=data.frame(Age=30:100),type="response")
    
              1           2           3           4           5           6           7 
    0.001038676 0.001270899 0.001554960 0.001902391 0.002327270 0.002846769 0.003481828 
              8           9          10          11          12          13          14 
    0.004257952 0.005206175 0.006364213 0.007777825 0.009502425 0.011604952 0.014166036 
             15          16          17          18          19          20          21 
    0.017282440 0.021069773 0.025665399 0.031231419 0.037957508 0.046063272 0.055799610 
             22          23          24          25          26          27          28 
    0.067448397 0.081319557 0.097744402 0.117063988 0.139611296 0.165686430 0.195524878 
             29          30          31          32          33          34          35 
    0.229260322 0.266885403 0.308216030 0.352866476 0.400242921 0.449561353 0.499891678 
             36          37          38          39          40          41          42 
    0.550224198 0.599549040 0.646935614 0.691599168 0.732945010 0.770586518 0.804338777 
             43          44          45          46          47          48          49 
    0.834193745 0.860284578 0.882846413 0.902179147 0.918615680 0.932497075 0.944154716 
             50          51          52          53          54          55          56 
    0.953898634 0.962010835 0.968742352 0.974312922 0.978912346 0.982702836 0.985821857 
             57          58          59          60          61          62          63 
    0.988385104 0.990489415 0.992215484 0.993630305 0.994789335 0.995738372 0.996515163 
             64          65          66          67          68          69          70 
    0.997150770 0.997670717 0.998095963 0.998443694 0.998728001 0.998960424 0.999150415 
             71 
    0.999305707 
    
    plot(df$Age,df$Normalised_value)
    lines(30:100,predict(glmfit,newdata=data.frame(Age=30:100),type="response"))
    

    Original:

    We can utilize splines::bs and lm, which creates piecewise linear spline regression:

    library(splines)
    
    spline_fit <- lm(Normalised_value ~ bs(Age, knots=c(57,62,66,71),degree=1),data=df)
    newdata <- data.frame(Age=30:100)
    newdata$normalized <- predict(spline_fit,newdata=newdata)
    newdata
    
       Age   normalized
    1   30 1.416396e-16
    2   31 7.407407e-03
    3   32 1.481481e-02
    4   33 2.222222e-02
    5   34 2.962963e-02
    6   35 3.703704e-02
    7   36 4.444444e-02
    8   37 5.185185e-02
    9   38 5.925926e-02
    10  39 6.666667e-02
    11  40 7.407407e-02
    12  41 8.148148e-02
    13  42 8.888889e-02
    14  43 9.629630e-02
    15  44 1.037037e-01
    16  45 1.111111e-01
    17  46 1.185185e-01
    18  47 1.259259e-01
    19  48 1.333333e-01
    20  49 1.407407e-01
    21  50 1.481481e-01
    22  51 1.555556e-01
    23  52 1.629630e-01
    24  53 1.703704e-01
    25  54 1.777778e-01
    26  55 1.851852e-01
    27  56 1.925926e-01
    28  57 2.000000e-01
    29  58 2.400000e-01
    30  59 2.800000e-01
    31  60 3.200000e-01
    32  61 3.600000e-01
    33  62 4.000000e-01
    34  63 4.500000e-01
    35  64 5.000000e-01
    36  65 5.500000e-01
    37  66 6.000000e-01
    38  67 6.400000e-01
    39  68 6.800000e-01
    40  69 7.200000e-01
    41  70 7.600000e-01
    42  71 8.000000e-01
    43  72 8.068966e-01
    44  73 8.137931e-01
    45  74 8.206897e-01
    46  75 8.275862e-01
    47  76 8.344828e-01
    48  77 8.413793e-01
    49  78 8.482759e-01
    50  79 8.551724e-01
    51  80 8.620690e-01
    52  81 8.689655e-01
    53  82 8.758621e-01
    54  83 8.827586e-01
    55  84 8.896552e-01
    56  85 8.965517e-01
    57  86 9.034483e-01
    58  87 9.103448e-01
    59  88 9.172414e-01
    60  89 9.241379e-01
    61  90 9.310345e-01
    62  91 9.379310e-01
    63  92 9.448276e-01
    64  93 9.517241e-01
    65  94 9.586207e-01
    66  95 9.655172e-01
    67  96 9.724138e-01
    68  97 9.793103e-01
    69  98 9.862069e-01
    70  99 9.931034e-01
    71 100 1.000000e+00