rcalculated-columnsmutate

How to create these columns in R?


Here is the sample data

 stfips <- c("39","39","39")
 year <- c("2023", "2023","2023")
 industry_code <- c(112, 113, 114)
 first_quarter_establishments <- c(987,654,321)
 county <- data.frame(stfips, year, industry_code, first_quarter_establishments)

The task at hand is to create a new column named period which would have a value of 01. The reason for the 01 is that it represents the first quarter. If the fourth column had the word, "Second" in the name then the period would be "02" and so on. Below is what I got from ChatGPT. the error is below. Any idea how I would create this period columm based on the wording of a column.

first_columns <- grepl("first", names(county), ignore.case = TRUE)
county$period <- ifelse(first_columns, "01", "") 
  Error in `$<-.data.frame`(`*tmp*`, period, value = c("", "", "", "01")) : 
  replacement has 4 rows, data has 3

Desired end result

 stfips     year    industry_code    first_quarter_establishments   period
  39        2023         112                 987                    01
  39        2023         113                 654                    01
  39        2023         114                 321                    01

Solution

  • I'm sure there are more elegant solutions, but in base R you can use match in combination with gsub to identify the quarter from the names of your data frame:

    quarters <- c("first" = 1, "second" = 2, 
                  "third" = 3, "fourth" = 4)
    
    county$quarter <- quarters[match(gsub("(.+?)(\\_.*)", "\\1", names(county[4])), 
                                     names(quarters))]
    

    Output:

    #   stfips year industry_code first_quarter_establishments quarter
    # 1     39 2023           112                          987       1
    # 2     39 2023           113                          654       1
    # 3     39 2023           114                          321       1
    

    If this were to be changed to second:

    second_quarter_establishments <- c(987,654,321)
    county <- data.frame(stfips, year, industry_code, second_quarter_establishments)
    
    county$quarter <- quarters[match(gsub("(.+?)(\\_.*)", "\\1", names(county[4])), 
                                     names(quarters))]
    
    #   stfips year industry_code second_quarter_establishments quarter
    # 1     39 2023           112                           987       2
    # 2     39 2023           113                           654       2
    # 3     39 2023           114                           321       2
    

    Note you could do away with gsub if you renamed your quarters vector, though assuming the names in the data frame are perfectly identical:

    quarters <- c("first_quarter_establishments" = 1, "second_quarter_establishments" = 2, 
                  "third_quarter_establishments" = 3, "fourth_quarter_establishments" = 4)
    
    county$quarter <- quarters[match(names(county[4]), names(quarters))]