rdataframedatasetpanel

Creating a new row in a longitudinal dataset


I have a longitudinal dataset in R comprising several countries observed over multiple time points. Let's simplify things and consider the following example

set.seed(123)
df=data.frame(Country=c(rep("DEU",16),rep("FRA",16),rep("ITA",16)),Year=rep(c(rep(1,4),rep(2,4),rep(3,4),rep(4,4)),3),industry=rep(c("A","B","C","D"),12),h_emp=rnorm(48,15,3.5))

The objective is to create a new row for each country and year, always labeled in the industry column as "C+D". The corresponding cell in h_emp should be equal to the sum of the values for h_emp in industries "C" and "D" for that country in that specific year. How can I achieve this?


Solution

  • Using dplyr, create a summarized df including sums for C+D, then bind back to your original df. Note your example data has multiple entries for some industries in each year/country; I assumed this was an error so I created new sample data.

    set.seed(123)
    library(dplyr)
    
    df <- expand.grid(
      Country = c("DEU", "FRA", "ITA"),
      Year = 1:4,
      industry = c("A", "B", "C", "D")
    )
    df$h_emp <- rnorm(48, 15, 3.5)
    
    df <- df %>%
      filter(industry %in% c("C", "D")) %>%
      summarize(
        industry = "C+D",
        h_emp = sum(h_emp),
        .by = c(Country, Year)
      ) %>%
      bind_rows(df, .) %>%
      arrange(Country, Year)
    

    Result:

    #> head(df, 15)
       Country Year industry     h_emp
    1      DEU    1        A 13.038335
    2      DEU    1        B 16.402700
    3      DEU    1        C 12.812363
    4      DEU    1        D 16.938712
    5      DEU    1      C+D 29.751074
    6      DEU    2        A 15.246779
    7      DEU    2        B 21.254196
    8      DEU    2        C 15.536806
    9      DEU    2        D 13.668351
    10     DEU    2      C+D 29.205157
    11     DEU    3        A 16.613207
    12     DEU    3        B 17.454746
    13     DEU    3        C 16.492625
    14     DEU    3        D 10.571113
    15     DEU    3      C+D 27.063738