Okay, here is a sample of what my data looks like:
GRADE_LVL | COURSE_NAME | COURSE_CODE | STUDENT_GENDER | ETHNICITY | OUTCOME |
---|---|---|---|---|---|
12 | Physics | 03165 | Male | White | Pass |
12 | Physics | 03165 | Female | White | Pass |
12 | Physics | 03165 | Nonbinary | Black or African American | Pass |
9 | Algebra I | 02052 | Female | Multiracial | Pass |
10 | Algebra I | 02052 | Female | White | Fail |
I need to report on 3 genders (male, female, nonbinary), and 7 ethnicities (Hispanic or Latino, American Indian or Alaska Native, Asian, Native Hawaiian/Other Pacific Islander, Black or African American, White, and Multiracial).
I am trying to write a function in R that will produce a table of the demographic facts for a set of parameters passed to it. I want the output of the function to produce a tibble that looks like this :
High School Students Who Passed Algebra I
Hispanic | Native American | Asian | Black | White | Multiracial | Total | |
---|---|---|---|---|---|---|---|
Male | 0 | 0 | 7 | 2 | 13 | 4 | 26 |
Female | 1 | 0 | 3 | 1 | 12 | 3 | 20 |
Nonbinary | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
Total | 1 | 0 | 10 | 3 | 26 | 7 | 47 |
Notes: This contains only dummy values that aren't related to those in my sample data snippet above. I have shortened the column names to save space on the screen.
Here is the code I have so far:
data <- dbGetQuery('secrets')
highSchool = c('09', '10', '11', '12')
passingOnly <- quo(OUTCOME == 'Pass')
algebra1 <- quo(COURSE_CODE == '02052')
ethnicCategories <- factor(c(
'Hispanic or Latino',
'American Indian or Alaska Native',
'Asian',
'Native Hawaiian/Other Pacific Islander',
'Black or African American',
'White',
'Multiracial'
))
genderCategories <- factor(c('Female', 'Male', 'Nonbinary'))
demographicBreakout <- function(filterConditions, gradeLevels) {
data %>%
filter( {{ filterConditions }} ) %>%
filter(GRADE_LVL %in% gradeLevels) %>%
select(STUDENT_GENDER, ETHNIC_DESC) %>%
group_by(STUDENT_GENDER, ETHNIC_DESC) %>%
summarise(COUNT = n()) %>%
pivot_wider(
names_from = ETHNIC_DESC,
values_from = COUNT,
values_fill = 0
) %>%
rename_at("STUDENT_GENDER", ~"Gender")
}
report <- demographicBreakout(
filterConditions = !!quo(!!algebra1 & !!passingOnly),
gradeLevels = highSchool
)
This code produces a tibble that looks like this:
Gender | Hispanic | Asian | Black | White | Multiracial |
---|---|---|---|---|---|
Female | 2 | 15 | 2 | 26 | 9 |
Male | 12 | 23 | 1 | 43 | 11 |
This is looking good so far, but I need to have all demographic categories present in the table even if the count is 0. I tried adding the following code snippet to my demographicBreakout
function in between the summarize
and pivot_wider
statements:
complete(
ETHNIC_DESC = ethnicCategories,
STUDENT_GENDER = genderCategories,
fill = list(COUNT = 0)
) %>%
Adding this code caused the following error:
Error in `reframe()`:
ℹ In argument: `complete(data = pick(everything()), ..., fill = fill, explicit = explicit)`.
ℹ In group 1: `STUDENT_GENDER = Female`.
Caused by error in `dplyr::full_join()`:
! Join columns in `y` must be present in the data.
✖ Problem with `STUDENT_GENDER`.
I have been unable to get this error resolved. In addition to this complete()
statement, I also need to run something similar on the rows so that nonbinary
counts appear. On top of all that, I still need to add in the row and column totals.
Any help getting over my current hurdle would be greatly appreciated.
The function can be simplified a lot, when we combine the factor conversion idea from Stefan's (now-deleted) answer, and the .drop = FALSE
argument for the count()
function:
ethnicCategories <- c('Hispanic or Latino',
'American Indian or Alaska Native',
'Asian',
'Native Hawaiian/Other Pacific Islander',
'Black or African American',
'White',
'Multiracial')
genderCategories <- c('Female', 'Male', 'Nonbinary')
demographicBreakout <- function(filterConditions, gradeLevels) {
data |>
filter({{filterConditions}} & GRADE_LVL %in% gradeLevels) |>
mutate(Gender = factor(STUDENT_GENDER, levels = genderCategories),
e = factor(ETHNIC_DESC, levels = ethnicCategories)) |>
count(Gender, e, .drop = FALSE) |>
pivot_wider(names_from = e, values_from = n)
}
Example:
demographicBreakout(
filterConditions = !!quo(COURSE_CODE == '02052'),
gradeLevels = c('09', '10', '11', '12')
) |> print(width = Inf)
Output:
# A tibble: 3 × 8
Gender `Hispanic or Latino` `American Indian or Alaska Native` Asian
<fct> <int> <int> <int>
1 Female 0 0 0
2 Male 0 0 0
3 Nonbinary 0 0 0
`Native Hawaiian/Other Pacific Islander` `Black or African American` White
<int> <int> <int>
1 0 0 1
2 0 0 0
3 0 0 0
Multiracial
<int>
1 0
2 0
3 0