rdplyrlinear-regressionrescale

Within a simple linear regression in R, how do I rescale age to estimate it's beta-coefficient per per year/5 years/10 years?


This might be a bit of a dumb question, but roaming around SO and other websites I can't find a straightforward answer: I've got data on the relationship between age and a continuous outcome:

library(dplyr)
library(tidyverse)
library(magrittr)

mydata <- 
structure(list(ID = c(104, 157, 52, 152, 114, 221, 320, 125, 
75, 171, 80, 76, 258, 82, 142, 203, 37, 92, 202, 58, 194, 38, 
4, 137, 25, 87, 40, 117, 21, 255, 277, 315, 96, 134, 185, 94, 
3, 153, 172, 65, 279, 209, 60, 13, 154, 160, 24, 29, 159, 213, 
127, 74, 48, 126, 184, 132, 61, 141, 27, 49, 8, 39, 164, 162, 
34, 205, 179, 119, 77, 135, 138, 165, 103, 253, 14, 20, 310, 
84, 30, 273, 22, 105, 262, 116, 86, 83, 145, 31, 95, 51, 81, 
271, 36, 50, 189, 2, 115, 7, 197, 54), age = c(67.1, 70.7, 53, 
61.7, 66.1, 57.7, 54.1, 67.2, 60.9, 55.8, 40.7, 57.6, 64.1, 70.7, 
47.5, 46.3, 66.7, 55, 63.3, 68.2, 61.2, 60.5, 52, 65.3, 48.9, 
56.9, 62.7, 75.2, 61.4, 57.9, 53.6, 58.1, 51, 67.3, 63.9, 57, 
43.2, 64.7, 62.8, 56.3, 51.7, 39.4, 45.2, 57.8, 55.7, 69.6, 61.5, 
50.1, 73.7, 55.5, 65.2, 54.6, 49, 35.2, 52.9, 46.3, 55, 52.5, 
54.2, 61, 57.4, 56.5, 53.6, 47.7, 64.2, 53.4, 60.9, 58.2, 60.7, 
50.3, 48.3, 74.7, 52.1, 59.9, 52.4, 70.8, 61.2, 66.5, 55.4, 57.5, 
59.2, 60.1, 52.3, 60.2, 54.8, 36.3, 61.5, 48.6, 56, 62, 64.8, 
40.4, 68.3, 60, 69.1, 56.6, 45.3, 58.5, 52.3, 52), continuous_outcome = c(3636.6, 
1128.2, 2007.5, 802.9, 332.3, 2636.1, 169.5, 67.9, 3261.8, 1920.3, 
155.2, 1677.2, 198.2, 11189.7, 560.9, 633.1, 196.1, 13.9, 100.7, 
7594.5, 1039.8, 83.9, 2646.8, 284.6, 306, 1135.6, 1883.1, 5681.4, 
1706.2, 2241.1, 97.7, 1106.8, 1107.1, 290.8, 2123.4, 267, 115.3, 
138.5, 152.7, 1338.9, 6709.8, 561.7, 1931.7, 3112.4, 1876.3, 
3795.9, 5706.7, 7.4, 1324.9, 4095.4, 205.4, 1886, 177.3, 304.4, 
1319.1, 415.9, 537.2, 3141.1, 740, 1976.7, 624.8, 983.1, 1163.5, 
1432.6, 3730.4, 2023.4, 498.2, 652.5, 982.7, 1345.3, 138.4, 1505.1, 
3528.1, 11.9, 884.5, 10661.6, 1911.4, 2800.8, 81.5, 396.4, 409.1, 
417.3, 186, 1892.4, 1689.7, 0, 210.1, 210.5, 3484.5, 3196.8, 
57.2, 20.2, 947, 540, 1603.1, 1571.8, 9.1, 149.2, 122, 63.2)), row.names = c(NA, 
-100L), class = c("tbl_df", "tbl", "data.frame"))

As you can see in the tibble, age is a continuous variable measured to precision of 1 decimal place:

 head(mydata)

# A tibble: 6 x 3
     ID   age continuous_outcome
  <dbl> <dbl>              <dbl>
1   104  67.1              3637.
2   157  70.7              1128.
3    52  53                2008.
4   152  61.7               803.
5   114  66.1               332.
6   221  57.7              2636.

When I fit a simple linear regression (for now assuming all assumptions are not-violated) I get the following beta-coefficient:

fit <- 
  lm(formula=continuous_outcome ~ age, 
     data=mydata)
fit

Call:
lm(formula = continuous_outcome ~ age, data = mydata)

Coefficients:
(Intercept)          age  
   -3400.12        86.06  

The beta-coefficient for age is 86.06. Does this mean that, as age is measured to 1 decimal place, that for every 0.1 years increase my outcome increases by 86.06? If so, how do I rescale age so that I am measuring the effect of age per, for example, 5 years or 10 years?

Thanks in advance!


Solution

  • The beta coefficient shows the amount that the dependent variable (DV, in this case continuous_outcome) will increase for every one unit increase in your independent variable (IV, in this case age in years).

    If you want to show the relationship per 1/10th of a year, multiply your age column before fitting the model, or divide the beta coefficient by 10.

    For your specific requests, since the beta coefficient is 86.06, you can multiply this by the number of years to get the increase of the continuous variable. So:

    To answer the last question (The estimate for the effect of age per 5 years), that would be 430.3, which is 86.06 * 5. So for every 5 years that a persons age increases, the continuous_outcome increases by 430.3 on average.