I have a panel dataset that I'd like to conduct diff in diff on. Right now this is my regression:
fit3 <- glm(df$empstat ~ factor(year) + factor(stateicp) + migrant_category + treated*post + treated*migrant_category
+ post*migrant_category + treated*post*migrant_category + race + educ + age +
marst, data = df, weights = perwt, family = 'gaussian'
)
but will this make R assume that each observation is independent of each other? If yes, what should I do to make R realize that this is a panel data?
If you are interested in fixed effects models and difference in difference, use the plm
package. Here is an example from Christopher Zorn:
# Panel data
WDI<-read_csv("https://github.com/PrisonRodeo/GSERM-Ljubljana-APD-git/raw/main/Data/WDI3.csv")
# Add a "Cold War" variable:
WDI$ColdWar <- with(WDI,ifelse(Year<1990,1,0))
# Keep a numeric year variable (for -panelAR-):
WDI$YearNumeric<-WDI$Year
# Make the data a panel dataframe:
WDI<-pdata.frame(WDI,index=c("ISO3","Year"))
# Pull out *only* those countries that, at some
# point during the observed periods, instituted
# a paid parental leave policy:
WDI<-WDI %>% group_by(ISO3) %>%
filter(any(PaidParentalLeave==1))
# Create a better trend variable:
WDI$Time<-WDI$YearNumeric-1950
# FE models...
fe1<-plm(ChildMortality~PaidParentalLeave+Time+
PaidParentalLeave*Time,data=WDI,
effect="individual",model="within")
fe2<-plm(ChildMortality~PaidParentalLeave+Time+
PaidParentalLeave*Time+log(GDPPerCapita)+
log(NetAidReceived)+GovtExpenditures,
data=WDI,effect="individual",model="within")
fe3<-plm(ChildMortality~PaidParentalLeave+Time+
PaidParentalLeave*Time,data=WDI,
effect="twoway",model="within")
fe4<-plm(ChildMortality~PaidParentalLeave+Time+
PaidParentalLeave*Time+log(GDPPerCapita)+
log(NetAidReceived)+GovtExpenditures,
data=WDI,effect="twoway",model="within")
# TABLE TIME
stargazer(fe1,fe2,fe3,fe4,
title="DiD Models of log(Child Mortality)",
column.separate=c(1,1,1),align=TRUE,
dep.var.labels.include=FALSE,
dep.var.caption="",
covariate.labels=c("Paid Parental Leave","Time (1950=0)",
"Paid Parental Leave x Time",
"ln(GDP Per Capita)",
"ln(Net Aid Received)",
"Government Expenditures"),
header=FALSE,model.names=FALSE,
model.numbers=FALSE,multicolumn=FALSE,
object.names=TRUE,notes.label="",
column.sep.width="-15pt",
omit.stat=c("f","ser"),type="text")
DiD Models of log(Child Mortality)
=====================================================================
fe1 fe2 fe3 fe4
---------------------------------------------------------------------
Paid Parental Leave -15.500*** -26.200*** -12.500*** -17.300*
(2.420) (7.220) (2.960) (9.360)
Time (1950=0) -0.838*** -1.480***
(0.025) (0.094)
Paid Parental Leave x Time -7.110*** -4.910*
(2.290) (2.600)
ln(GDP Per Capita) -1.780*** -3.020***
(0.471) (0.552)
ln(Net Aid Received) 0.873*** 0.842***
(0.139) (0.146)
Government Expenditures 0.310*** 0.524*** 0.247*** 0.319*
(0.044) (0.128) (0.056) (0.169)
---------------------------------------------------------------------
Observations 2,360 622 2,360 622
R2 0.496 0.717 0.009 0.143
Adjusted R2 0.485 0.701 -0.035 0.014
=====================================================================
*p<0.1; **p<0.05; ***p<0.01