I am trying to calculate the dissimilarity index of several schools in a country using the segregation package. My dataset currently looks like this:
# A tibble: 948 × 4
ethnicity school acyear n
<chr> <chr> <chr> <dbl>
1 White school 1 2010/11 3245
2 Unknown/not applicable school 1 2010/11 675
3 Other school 1 2010/11 5
4 Mixed school 1 2010/11 50
5 Black school 1 2010/11 40
6 Asian school 1 2010/11 95
7 White school 2 2010/11 5905
8 Unknown/not applicable school 2 2010/11 1060
9 Other school 2 2010/11 15
10 Mixed school 2 2010/11 115
# … with 938 more rows
The command that I am using is - very similar to the command I used to calculate the Mutual Information Index and Theil’s Entropy Index:
dissimilarity (data,
group = 'ethnicity',
unit = 'school',
weight = 'n')
However, I am getting the following error:
Error in dissimilarity(acyear1, group = "ethnicity", unit = "school", weight = "n") :
The D index only allows two distinct groups
I tried to calculate a dummy variable for ethnicity, but I am still getting the same error.
Can someone help me?
Thank you :)
In this case, the dissimilarity index calculation fails because by definition, the index only compares two groups to each other (in the literature, this is usually a Black-White dissimilarity index). In your data, you have 6 different race/ethnicity groups, so you can either a) calculate the index for each possible combination of race/ethnicity groups (e.g., White-Black, White-Asian, Black-Asian, etc.); b) decide one race/ethnicity to use as a reference group and collapse all other race/ethnicity categories together (e.g., White-nonWhite where non-White = Black + Asian + Mixed + Other + Unknown); or c) use a different index of segregation that is designed around having multiple race/ethnicity groups.