rlogistic-regressionhierarchical-dataregional

Testing for regional interaction effects using a hierarchical model in R


I want to test for voters' motives behind castig one's vote for a given party 'XY' and whether residence in a region makes a significant difference. The question is, do voters from region A (coded '1') significantly differ in their motives from voters in region B (coded '0').

Here is how my data are structured (over-simplified):

region_AB   motive   voter_attribute  vote_for_party_XY
1           1        1                1
1           0        1                1
1           1        0                0
0           0        0                0
0           0        1                0
0           1        0                0

My guess would be to run a binary logistic, hierarchical model in R but then how would I find out whether different motives and voter characteristics play a role for citizens in region A and B? I Don't simply want to test for regional effects but for the difference region makes within the overall model.

Or can I simply throw interaction terms in a standard logistic regression model (e.g. region*motive1, region*motive2...)?

But in that case, how many interaction terms can I add? Would I have to recode the zeros in 'region' into something like 0.0000000001 as multiplication with zero would distort the results? Lastly, would I have to throw the interaction terms into the model alongside the two components of this interaction term (e.g. region, motive, and region*motive) or would this only result in multicollinearity?

Thank you!


Solution

  • The outcome is binary, so the natural modeling framework would be logistic regression. I do not see a hierarchal structure to the data gathering, so would simply recommend using an interaction term formed between region and motivation using the *-operator.

     glm( vote_for_party_XY ~ region_AB * motive +  voter_attribute, family = "binomial")
    

    Note that the R formula interface includes both "main effect" terms when the "*" operator is used. You get the same effect with:

     region_AB + motive + region_AB : motive 
    

    There will be (at least) three coefficients describing the region-motivation results: one for region alone (applicable to persons in region==1 and motivation=0), a second for motivation alone( applicable to persons in region 0, and motivation=1) and a third (for those persons with both region=1 and motivation=1). All estimates will be relative to an intercept term that applies to persons with all factors at the 0-(reference) level. To calculate voting rates for persons with region=1 and region=1, you add coefficients for Intercept, region=1 and the interaction coefficient. If there are more than two levels of 'region' (say n) and 'motivation' (say m) the number of coefficients will be 1 +(n-1) +(m-1)+(n-1)*(m-1) , which I think comes out to n*m (including Intercept).