rdataframena

Calculate difference between multiple changing columns in R


I have a dataframe (df) in R with 7 columns

ID Year1 Year2 Year3 Size1 Size2 Size3
A 2021 2022 NA 10 15 NA
B 2022 2023 2024 20 21 25
C 2021 2022 NA 5 20 NA

I want to add an 8th column called 'difference' that gives me the growth rate for each individual, which would be (size 3 - size 1) / (year 3 - year 1). I was able to do that with this code:

df2 <- df %>%
  mutate(difference = (df$Size3 -df$Size1)/(df$Year3 - df$Year1))

However, some individuals only have year 1 and year 2. How can I write a statement saying if Year3 = NA, then make the 'difference' column from (Size2 - Size1) / (Year2 - Year1)?

Desired output:

ID Year 1 Year 2 Year 3 Size 1 Size 2 Size 3 Difference
A 2021 2022 NA 10 15 NA 5
B 2022 2023 2024 20 21 25 2.5
C 2021 2022 NA 5 20 NA 15

Solution

  • You can try ifelse

    transform(
      df,
      Difference = ifelse(
        is.na(Year3),
        (Size2 - Size1) / (Year2 - Year1),
        (Size3 - Size1) / (Year3 - Year1)
      )
    )
    

    which gives

      ID Year1 Year2 Year3 Size1 Size2 Size3 Difference
    1  A  2021  2022    NA    10    15    NA        5.0
    2  B  2022  2023  2024    20    21    25        2.5
    3  C  2021  2022    NA     5    20    NA       15.0