[SOLVED] Data file organized in block per year: add column for year to combine all data

Data file organized in block per year: add column for year to combine all data

I have large quantities of datasets in Excel that I would like to analyze in R. The files have a format that organizes all information per block of the same year, which looks like:

Group    <- c(2010, 'Group', 'A', 'B', 'C', 2011, 'Group', 'A', 'B', 'E', 2012, 'Group', 'A', 'B')
Value <- c(NA,'Value', 1, 2, 9, NA, 'Value', 3, 5, 2, NA, 'Value', 9, 1)

df <- cbind(Group, Value)
        
  Group   Value  
 1: 2010  NA     
 2: Group Value
 3: A     1    
 4: B     2    
 5: C     9    
 6: 2011  NA     
 7: Group Value
 8: A     3    
 9: B     5    
10: E     2    
11: 2012  NA     
12: Group Value
13: A     9  
14: B     1

To be able to analyze the data, I would like to automatically add a column for the year so that all data can be combined, as follows:

   Year Group Value  
 1: 2010  A  1       
 2: 2010  B  2
 3: 2010  C  9        
 4: 2011  A  3     
 5: 2011  B  5        
 6: 2011  E  2
 7: 2012  A  9 
 8: 2012  B  1

Solution

library(data.table)
dt <- data.table(df)
dt[, Year := Group[1], cumsum(is.na(Value))][Value != 'Value']

   Group Value Year
1:     A     1 2010
2:     B     2 2010
3:     C     9 2010
4:     A     3 2011
5:     B     5 2011
6:     E     2 2011
7:     A     9 2012
8:     B     1 2012

in Base R:

subset(transform(df, Year = ave(Group, cumsum(is.na(Value)), FUN=\(x)x[1])), Value != 'Value')
   Group Value Year
3      A     1 2010
4      B     2 2010
5      C     9 2010
8      A     3 2011
9      B     5 2011
10     E     2 2011
13     A     9 2012
14     B     1 2012

Note that the above columns are character. You can use type.convert(new_df, as.is = TRUE) where new_df is the resultant df to convert the columns to respective classes