I want to test for voters' motives behind castig one's vote for a given party 'XY' and whether residence in a region makes a significant difference. The question is, do voters from region A (coded '1') significantly differ in their motives from voters in region B (coded '0').
Here is how my data are structured (over-simplified):
region_AB motive voter_attribute vote_for_party_XY
1 1 1 1
1 0 1 1
1 1 0 0
0 0 0 0
0 0 1 0
0 1 0 0
My guess would be to run a binary logistic, hierarchical model in R but then how would I find out whether different motives and voter characteristics play a role for citizens in region A and B? I Don't simply want to test for regional effects but for the difference region makes within the overall model.
Or can I simply throw interaction terms in a standard logistic regression model (e.g. region*motive1
, region*motive2
...)?
But in that case, how many interaction terms can I add? Would I have to recode the zeros in 'region' into something like 0.0000000001
as multiplication with zero would distort the results? Lastly, would I have to throw the interaction terms into the model alongside the two components of this interaction term (e.g. region
, motive
, and region*motive
) or would this only result in multicollinearity?
Thank you!
The outcome is binary, so the natural modeling framework would be logistic regression. I do not see a hierarchal structure to the data gathering, so would simply recommend using an interaction term formed between region and motivation using the *
-operator.
glm( vote_for_party_XY ~ region_AB * motive + voter_attribute, family = "binomial")
Note that the R formula interface includes both "main effect" terms when the "*" operator is used. You get the same effect with:
region_AB + motive + region_AB : motive
There will be (at least) three coefficients describing the region-motivation results: one for region alone (applicable to persons in region==1 and motivation=0), a second for motivation alone( applicable to persons in region 0, and motivation=1) and a third (for those persons with both region=1 and motivation=1). All estimates will be relative to an intercept term that applies to persons with all factors at the 0-(reference) level. To calculate voting rates for persons with region=1 and region=1, you add coefficients for Intercept, region=1 and the interaction coefficient. If there are more than two levels of 'region' (say n) and 'motivation' (say m) the number of coefficients will be 1 +(n-1) +(m-1)+(n-1)*(m-1)
, which I think comes out to n*m
(including Intercept).