rorganization

Data file organized in block per year: add column for year to combine all data


I have large quantities of datasets in Excel that I would like to analyze in R. The files have a format that organizes all information per block of the same year, which looks like:

Group    <- c(2010, 'Group', 'A', 'B', 'C', 2011, 'Group', 'A', 'B', 'E', 2012, 'Group', 'A', 'B')
Value <- c(NA,'Value', 1, 2, 9, NA, 'Value', 3, 5, 2, NA, 'Value', 9, 1)

df <- cbind(Group, Value)
        
  Group   Value  
 1: 2010  NA     
 2: Group Value
 3: A     1    
 4: B     2    
 5: C     9    
 6: 2011  NA     
 7: Group Value
 8: A     3    
 9: B     5    
10: E     2    
11: 2012  NA     
12: Group Value
13: A     9  
14: B     1  

To be able to analyze the data, I would like to automatically add a column for the year so that all data can be combined, as follows:

   Year Group Value  
 1: 2010  A  1       
 2: 2010  B  2
 3: 2010  C  9        
 4: 2011  A  3     
 5: 2011  B  5        
 6: 2011  E  2
 7: 2012  A  9 
 8: 2012  B  1     

Solution

  • library(data.table)
    dt <- data.table(df)
    dt[, Year := Group[1], cumsum(is.na(Value))][Value != 'Value']
    
       Group Value Year
    1:     A     1 2010
    2:     B     2 2010
    3:     C     9 2010
    4:     A     3 2011
    5:     B     5 2011
    6:     E     2 2011
    7:     A     9 2012
    8:     B     1 2012
    

    in Base R:

    subset(transform(df, Year = ave(Group, cumsum(is.na(Value)), FUN=\(x)x[1])), Value != 'Value')
       Group Value Year
    3      A     1 2010
    4      B     2 2010
    5      C     9 2010
    8      A     3 2011
    9      B     5 2011
    10     E     2 2011
    13     A     9 2012
    14     B     1 2012
    

    Note that the above columns are character. You can use type.convert(new_df, as.is = TRUE) where new_df is the resultant df to convert the columns to respective classes