Here is the sample data
stfips <- c("39","39","39")
year <- c("2023", "2023","2023")
industry_code <- c(112, 113, 114)
first_quarter_establishments <- c(987,654,321)
county <- data.frame(stfips, year, industry_code, first_quarter_establishments)
The task at hand is to create a new column named period which would have a value of 01. The reason for the 01 is that it represents the first quarter. If the fourth column had the word, "Second" in the name then the period would be "02" and so on. Below is what I got from ChatGPT. the error is below. Any idea how I would create this period columm based on the wording of a column.
first_columns <- grepl("first", names(county), ignore.case = TRUE)
county$period <- ifelse(first_columns, "01", "")
Error in `$<-.data.frame`(`*tmp*`, period, value = c("", "", "", "01")) :
replacement has 4 rows, data has 3
Desired end result
stfips year industry_code first_quarter_establishments period
39 2023 112 987 01
39 2023 113 654 01
39 2023 114 321 01
I'm sure there are more elegant solutions, but in base R you can use match
in combination with gsub
to identify the quarter from the names
of your data frame:
quarters <- c("first" = 1, "second" = 2,
"third" = 3, "fourth" = 4)
county$quarter <- quarters[match(gsub("(.+?)(\\_.*)", "\\1", names(county[4])),
names(quarters))]
Output:
# stfips year industry_code first_quarter_establishments quarter
# 1 39 2023 112 987 1
# 2 39 2023 113 654 1
# 3 39 2023 114 321 1
If this were to be changed to second:
second_quarter_establishments <- c(987,654,321)
county <- data.frame(stfips, year, industry_code, second_quarter_establishments)
county$quarter <- quarters[match(gsub("(.+?)(\\_.*)", "\\1", names(county[4])),
names(quarters))]
# stfips year industry_code second_quarter_establishments quarter
# 1 39 2023 112 987 2
# 2 39 2023 113 654 2
# 3 39 2023 114 321 2
Note you could do away with gsub
if you renamed your quarters
vector, though assuming the names in the data frame are perfectly identical:
quarters <- c("first_quarter_establishments" = 1, "second_quarter_establishments" = 2,
"third_quarter_establishments" = 3, "fourth_quarter_establishments" = 4)
county$quarter <- quarters[match(names(county[4]), names(quarters))]