rpanel-data

Quickly split a dataframe by year in R


I have a panel that looks like this

country <- c("A","B","C","A","B","C","A","B","C")
industry<- c("X","Y","Z","X","Y","Z","X","Y","Z")
x2006<- sample(1000:100000,9)
x2007<- sample(1000:100000,9)
x2008<- sample(1000:100000,9)
dat <- data.frame (country,industry,x2006,x2007,x2008)  

I am doing something very simple like

dat2006 <- dat%>%
    select(country,industry,x2006)

then using write.csv to save it as its own file

What is the best way to do this if I wanted to repeat that and save a separate file for each year (i.e. column) in the data set?


Solution

  • You could use sapply:

    sapply(grep("x", names(dat)), function(y)
           write.csv(dat[, c(1, 2, y)], 
                     paste0(names(dat[y]), ".csv"),
                     row.names = FALSE)
    )
    

    grep finds the columns with x, sapply loops through them. This will name your csv file the column name selected and save it in the working directory.

    Note, you could specify the columns in other ways too. A few alternatives:

    # using column locations (numbers) directly
    sapply(3:5, function(y)
      write.csv(dat[, c(1, 2, y)], 
                paste0(names(dat[y]), ".csv"),
                row.names = FALSE)
    )
    
    # using column names
    sapply(c("x2006", "x2007", "x2008"), function(y)
           write.csv(dat[, c("country", "industry", y)], 
                     paste0(names(dat[y]), ".csv"),
                     row.names = FALSE))