rdataframesubsetrenamerecycle

How can I make a function in R to create subsets of columns?


I have a data frame in R with almost 40 years (from 1980 to 2019) in columns and I need a function to create several subset taking the same variable and rename, then I have to put in other data frame below.

X1980 <- subset(all_data,select=c("Pais","RubroId","X1980"))
names(X1980)[names(X1980) == 'X1980'] <- 'Valor'
X1980$ANIO <- 1980

X1981 <- subset(all_data,select=c("Pais","RubroId","X1981"))
names(X1981)[names(X1981) == 'X1981'] <- 'Valor'
X1981$ANIO <- 1981

X1982 <- subset(all_data,select=c("Pais","RubroId","X1982"))
names(X1982)[names(X1982) == 'X1982'] <- 'Valor'
X1982$ANIO <- 1982

final_data <- rbind(X1980,X1981,X1982)

Solution

  • We can create a function with columns to select (cols_select) as a character vector, names_to_change('Valor' or any other column name), subset the dataset ('dat') by selecting the columns, set the column name and create a new column 'ANIO'

    f1 <- function(dat, cols_select, names_to_change){
         yearcol <- grep("^X\\d{4}$", cols_select, value = TRUE)
         tmpdat <- subset(dat,select=cols_select)
         names(tmpdat)[names(tmpdat) == yearcol] <- names_to_change
         tmpdat$ANIO <- as.integer(sub("^X", "", yearcol))
         tmpdat
      }
    

    and use that in lapply

    nm1 <- paste0("X", 1980:1982)
    out <- do.call(rbind, lapply(nm1, function(x)
             f1(dat, cols_select = c("Pais","RubroId", x), "Valor")))