Below is the sample data. The task at hand is creating two new columns that would designate something by zip code. The first new column would be titled Las_Vegas and the second would be Laughlin. The first eight zip codes would have a value of 1 for Las Vegas and the second eight would have a value of 1 for Laughlin. The purpose of this is that I want to sum the employment for Las Vegas and Laughlin.
First question: Would it be best to use ifelse or case_when? Second question: Making the two new columns into defacto dummy variables... is this the best approach?
zipcode <-c(89102,89103,89104,89105,89106,89107,89108,89109,89110,89111,89112,89113,89114,89115,89116,89117)
naicstest<-c(541213,541213,541213,541213,541213,541213,541213,541213,541213,541213,541213,541213,541213,541212,541215,541214)
emptest <-c(2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32)
county <- data.frame(zipcode,naicstest,emptest)
End result. This end result would have sixteen rows. I kept it short for sake of simplicity. one row for Las_Vegas and one row for Laughlin but there would be eight rows for Las_Vegas and eight for Laughlin. I know how to do the summarise (summing employment) but struggling how to make the two columns.
zipcode naicstest emptest Las_Vegas Laughlin
89102 541213 2 1 0
89110 541213 18 0 1
We can use tidyverse
match
the 'zipcode' by unique(zipcode)
to get a numeric index for each unique zipcode.%/%
vector
of valuesslice_head
with n = 1pivot_wider
library(dplyr)
library(tidyr)
county %>%
group_by(un1 = c("Las_Vegas", "Laughlin")[
(match(zipcode, unique(zipcode)) -1) %/% 8 + 1]) %>%
slice_head(n = 1) %>%
mutate(n = 1) %>%
pivot_wider(names_from = un1, values_from = n, values_fill = 0)
-output
# A tibble: 2 x 5
zipcode naicstest emptest Las_Vegas Laughlin
<dbl> <dbl> <dbl> <dbl> <dbl>
1 89102 541213 2 1 0
2 89110 541213 18 0 1
If we want to return all the rows, then don't do the slice_head
, instead create a sequence column - row_number()
county %>%
group_by(un1 = c("Las_Vegas", "Laughlin")[
(match(zipcode, unique(zipcode)) -1) %/% 8 + 1]) %>%
mutate(n = 1, rn = row_number()) %>%
ungroup %>%
pivot_wider(names_from = un1, values_from = n, values_fill = 0) %>%
select(-rn)
-ouptut
# A tibble: 16 x 5
zipcode naicstest emptest Las_Vegas Laughlin
<dbl> <dbl> <dbl> <dbl> <dbl>
1 89102 541213 2 1 0
2 89103 541213 4 1 0
3 89104 541213 6 1 0
4 89105 541213 8 1 0
5 89106 541213 10 1 0
6 89107 541213 12 1 0
7 89108 541213 14 1 0
8 89109 541213 16 1 0
9 89110 541213 18 0 1
10 89111 541213 20 0 1
11 89112 541213 22 0 1
12 89113 541213 24 0 1
13 89114 541213 26 0 1
14 89115 541212 28 0 1
15 89116 541215 30 0 1
16 89117 541214 32 0 1