A seemingly straightforward task has proven to be really difficult in R. I'm working with survey data containing propositions asking respondents to indicate their level of agreement or disagreement on a 7-point Likert Scale. I'm trying to create frequency tables (see Table 1 below) that show the proportion of respondents in each sex/age group category that selected strongly agreed, slightly agreed, agreed, neutral, etc., in response to each proposition.
How can I create Table 1 from Table 2 (this is the way my current dataframe is setup in R)? In Table 2, the columns headed Q31, Q32, Q33, and Q34 are each separate propositions and the values in the cells are numerical codes for the 7 options on the Likert scale.
Table 1
Sex | Age Group | Question | Strongly Agree |
---|---|---|---|
1 | 30 - 39 | 31 | 0.21 |
2 | 30 - 39 | 31 | 0.48 |
1 | 40 - 49 | 31 | 0.12 |
2 | 40 - 49 | 31 | 0.65 |
Table 2
Sex | Age Group | Q31 | Q32 | Q33 | Q34 |
---|---|---|---|---|---|
1 | 30 - 39 | 1 | 7 | 1 | 5 |
2 | 30 - 39 | 3 | 5 | 2 | 6 |
1 | 40 - 49 | 4 | 6 | 3 | 2 |
2 | 40 - 49 | 2 | 2 | 4 | 2 |
Open to any suggestions, as I've been banging my head against this for too long! Thanks!
The first step might be to transform your wide data into a long format (age, sex and than one column for the kind of question and one column for the answer to that question). With this long format or tidy data you can easily group by question, age and sex and calculate the proportion for each answer.
Code
library(tidyverse)
df %>%
pivot_longer(cols = -c(Sex, `Age Group`),
names_to = "Question",
values_to = "Value") %>%
group_by(Question, Sex, `Age Group`) %>%
summarise(`Strongly Agree` = sum(Value == 7)/n(),
`Slightly Agree` = sum(Value == 6)/n(),
Agree = sum(Value == 5)/n(),
Neutral = sum(Value == 4)/n(),
Disagree = sum(Value == 3)/n(),
`Slightly Disagree` = sum(Value == 2)/n(),
`Strongly Disagree` = sum(Value == 1)/n())
Output
# A tibble: 16 x 10
# Groups: Question, Sex [8]
Question Sex `Age Group` `Strongly Agree` `Slightly Agree` Agree Neutral Disagree `Slightly Disagree` `Strongly Disagree`
<chr> <int> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Q31 1 30-39 0 0 0 0 0 0 1
2 Q31 1 40-49 0 0 0 1 0 0 0
3 Q31 2 30-39 0 0 0 0 1 0 0
4 Q31 2 40-49 0 0 0 0 0 1 0
Note: In your example table 2 each sex x age combination exists once, therefore the proportions for your example are 0 or 1.
Data
df <- structure(list(Sex = c(1L, 2L, 1L, 2L), `Age Group` = structure(c(1L,
1L, 2L, 2L), .Label = c("30-39", "40-49"), class = "factor"),
Q31 = c(1L, 3L, 4L, 2L), Q32 = c(7L, 5L, 6L, 2L), Q33 = 1:4,
Q34 = c(5L, 6L, 2L, 2L)), class = "data.frame", row.names = c(NA,
-4L))