I Have a 2 questions in one. I have 20 data frames. Each one is subject to a given year (from 2000 to 2020). They all have the same columns. 1) I want to merge them based on similar observations for a list of variables (columns), so I can construct a panel. 2) Plus when merging I want to rename the columns by adding a suffixes indicating the date. For example, let take 3 dataframes
df1
year_sample birth_date country work_establishment Wage
2014 1995 US X2134 1700
2014 1996 US X26 1232
2014 1992 CANADA X26 2553
2014 1990 FRANCE X4T346 6574
2014 1983 BELGIUM X2E43 1706
2014 1975 US X2134 1000
2014 1969 CHINA XXZT55 996
df2
year_sample birth_date country work_establishment Wage
2015 1995 US X2134 1756
2015 1996 US X26 1230
2015 1992 CANADA X26 2700
2015 1990 FRANCE X4T346 6574
2015 1975 US X2134 1000
2015 1979 GERMANY X35555 2435
df3
year_sample birth_date country work_establishment Wage
2016 1995 US X2134 1750
2016 1996 US X26 1032
2016 1992 CANADA X26 2353
2016 1990 FRANCE X4T346 6574
2016 1955 MALI X2244 1000
2016 1979 GERMANY X35555 2435
If an observation have similar values for c(birth_date; country ; work_establisment) then I will considere it as the same person. I want therefore:
df_final
id birth_date country work_establishment Wage_2014 Wage_2015 Wage_2016
1 1995 US X2134 1700 1756 1750
2 1996 US X26 1232 1230 1032
3 1992 CANADA X26 2553 2700 2353
4 1990 FRANCE X4T346 6574 6574 6574
I know that if I had just two dataframes I can do :
df_final <- transform(merge(df1,df2, by=c("birth_date", "country", "work_establishment"), suffixes=c("_2014", "_2015")))
But I can't manage to do it for several dataframes at once.
Thank you!
You can get all the dataframes in a list.
list_df <- mget(paste0('df', 1:3))
#OR
#list_df <- list(df1, df2, df3)
Then add suffix to 'Wage'
column in each of the dataframe from the year_sample
value and drop the year column and use Reduce
to merge
the dataframes into one.
result <- Reduce(function(x, y)
merge(x, y, by=c("birth_date", "country", "work_establishment")),
lapply(list_df, function(x)
{names(x)[5] <- paste('Wage', x$year_sample[1], sep = '_');x[-1]}))
result
# birth_date country work_establishment Wage_2014 Wage_2015 Wage_2016
#1 1990 FRANCE X4T346 6574 6574 6574
#2 1992 CANADA X26 2553 2700 2353
#3 1995 US X2134 1700 1756 1750
#4 1996 US X26 1232 1230 1032