rdataframejoinmergepanel

construct a pseudo panel based on similar values for some variables in R


I Have a 2 questions in one. I have 20 data frames. Each one is subject to a given year (from 2000 to 2020). They all have the same columns. 1) I want to merge them based on similar observations for a list of variables (columns), so I can construct a panel. 2) Plus when merging I want to rename the columns by adding a suffixes indicating the date. For example, let take 3 dataframes

df1

year_sample   birth_date    country    work_establishment   Wage      
   2014          1995        US            X2134            1700       
   2014          1996        US            X26              1232       
   2014          1992        CANADA        X26              2553      
   2014          1990        FRANCE        X4T346           6574      
   2014          1983        BELGIUM       X2E43            1706       
   2014          1975        US            X2134            1000      
   2014          1969        CHINA         XXZT55           996       

df2

year_sample   birth_date    country    work_establishment   Wage      
   2015          1995        US            X2134            1756       
   2015          1996        US            X26              1230       
   2015          1992        CANADA        X26              2700       
   2015          1990        FRANCE        X4T346           6574       
   2015          1975        US            X2134            1000       
   2015          1979        GERMANY       X35555           2435      

df3

year_sample   birth_date    country    work_establishment   Wage      
   2016          1995        US            X2134            1750       
   2016          1996        US            X26              1032       
   2016          1992        CANADA        X26              2353       
   2016          1990        FRANCE        X4T346           6574        
   2016          1955        MALI          X2244            1000       
   2016          1979        GERMANY       X35555           2435  

If an observation have similar values for c(birth_date; country ; work_establisment) then I will considere it as the same person. I want therefore:

df_final

 id   birth_date    country    work_establishment   Wage_2014      Wage_2015  Wage_2016 
  1   1995           US            X2134            1700           1756        1750
  2   1996           US            X26              1232           1230        1032
  3   1992           CANADA        X26              2553           2700        2353
  4   1990           FRANCE        X4T346           6574           6574        6574

I know that if I had just two dataframes I can do :

df_final <- transform(merge(df1,df2, by=c("birth_date", "country", "work_establishment"), suffixes=c("_2014", "_2015")))

But I can't manage to do it for several dataframes at once.

Thank you!


Solution

  • You can get all the dataframes in a list.

    list_df <- mget(paste0('df', 1:3))
    #OR
    #list_df <- list(df1, df2, df3)
    

    Then add suffix to 'Wage' column in each of the dataframe from the year_sample value and drop the year column and use Reduce to merge the dataframes into one.

    result <- Reduce(function(x, y)
                 merge(x, y, by=c("birth_date", "country", "work_establishment")), 
                 lapply(list_df, function(x) 
                  {names(x)[5] <- paste('Wage', x$year_sample[1], sep = '_');x[-1]}))
    
    result
    
    #  birth_date country work_establishment Wage_2014 Wage_2015 Wage_2016
    #1       1990  FRANCE             X4T346      6574      6574      6574
    #2       1992  CANADA                X26      2553      2700      2353
    #3       1995      US              X2134      1700      1756      1750
    #4       1996      US                X26      1232      1230      1032