I have a sensitive dataset so I created a mock one here for show.
data.frame(
Year = c("2010", "2010", "2010", "2011", "2011", "2012", "2013", "2013", "2013"),
Race = c("White", "White", "Asian", "White", "Black", "Black", "Unknown", "Unknown", "White"),
Ethnicity = c("Hispanic", "Hispanic", "Not Hispanic", "Hispanic", "Not Hispanic", "Not Hispanic", "Unknown", "Hispanic", "Not Hispanic")
)
Year Race Ethnicity
1 2010 White Hispanic
2 2010 White Hispanic
3 2010 Asian Not Hispanic
4 2011 White Hispanic
5 2011 Black Not Hispanic
6 2012 Black Not Hispanic
7 2013 Unknown Unknown
8 2013 Unknown Hispanic
9 2013 White Not Hispanic
In reality, I have a dataset that goes from 2010-2021, so 12 years total. There are also around 6/7 racial categories, and 3 different answers for ethnicity (Hispanic/Latino, not hispanic/latino, unknown).
I am trying to obtain counts for each year, race, and ethnicity (for example, 2010 white hispanic, 2010 white non-hispanic, 2010 asian hispanic, 2010 asian non-hispanic, etc...). I am currently using this function to pull the counts-
raceethfunc <- function(x,y,z){
df %>% filter(Race == x & Ethnicity == y and Year = z) %>%
nrow()
}
H_white2010 <- raceethfunc(x = "White", y = "Hispanic or Latino", z = "2010")
H_white2011 <- raceethfunc(x = "White", y = "Hispanic or Latino", z = "2011")
H_white2012 <- raceethfunc(x = "White", y = "Hispanic or Latino", z = "2012")
Etc...
I am having to do this for each year, race, and ethnicity which means I would have to be copying and pasting like 200+ lines of code to change maybe the year in one line, or the race in another, it is a very inefficient way of going about it.
I am newer to coding but functions especially. I tried using a for() loop but could not understand how to get it to run, any guidance on a loop or a more efficient way to go about this would greatly be appreciated.
PS- This is my first post ever here as well, if I am doing something incorrectly, please let me know how I can better my future posts!
group_by
and count
from {dplyr}
package, like:
df <- data.frame(
Year = c("2010", "2010", "2010", "2011", "2011", "2012", "2013", "2013", "2013"),
Race = c("White", "White", "Asian", "White", "Black", "Black", "Unknown", "Unknown", "White"),
Ethnicity = c("Hispanic", "Hispanic", "Not Hispanic", "Hispanic", "Not Hispanic", "Not Hispanic", "Unknown", "Hispanic", "Not Hispanic")
)
df |>
dplyr::group_by(Year, Race, Ethnicity) |>
dplyr::count()
#> # A tibble: 8 × 4
#> # Groups: Year, Race, Ethnicity [8]
#> Year Race Ethnicity n
#> <chr> <chr> <chr> <int>
#> 1 2010 Asian Not Hispanic 1
#> 2 2010 White Hispanic 2
#> 3 2011 Black Not Hispanic 1
#> 4 2011 White Hispanic 1
#> 5 2012 Black Not Hispanic 1
#> 6 2013 Unknown Hispanic 1
#> 7 2013 Unknown Unknown 1
#> 8 2013 White Not Hispanic 1
Created on 2023-06-30 with reprex v2.0.2