I am reproducing in R
some survival analysis results published in a journal. The original results were produced in Stata. Here are the original results:
Here is the code to produce these results in R
:
# load packages
library(dplyr)
library(foreign)
library(msm)
library(stargazer)
# load Svolik's original data
data = read_stata("leaders, institutions, covariates, updated tvc.dta")
# set a t0 for each row
data = mutate(data,t0 = lag(t,default=0), .by=leadid)
# coup survival object original
survobj_coup = Surv(data[["t0"]], data[["_t"]], data$c_coup)
# coup model original
coups_original <- coxph(survobj_coup~legislature + lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age,
data=data, ties="breslow")
# revolt survival object original
survobj_revolt = Surv(data[["t0"]], data[["_t"]], data$c_revolt)
# revolt model original
revolt_original <- coxph(survobj_revolt~legislature + lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ mil+ cw+ age,
data=data, ties="breslow")
# natural survival object original
survobj_natural = Surv(data[["t0"]], data[["_t"]], data$c_natural)
# natural model original
natural_original <- coxph(survobj_natural~legislature + lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age,
data=data, ties="breslow")
# Define a function to exponentiate coefficients
exp_coef <- function(x) {exp(x) }
# Create the table using stargazer
stargazer(natural_original, coups_original, revolt_original, apply.coef = exp_coef, p.auto = FALSE)
While I am able to produce the exact same coefficients (save for slight differences in rounding) with the exact same significance levels, the standard errors do not match. For example, in Model 1 in the figure (first column in Natural Causes
), I obtain a standard error of 0.414 rather than 0.198 for the coefficient on Legislature
(0.456*
). I was reading that the differences may be due to how the standard errors are transformed (something to do with the delta method perhaps). Does anyone have any advice? Thanks.
Quick answer is they measure different things. In your example, Stata reports approximate standard errors of HRs, while R reports SE of coefficient se(coef)
. I used an example from Stata manual:
library(survival)
library(webuse)
webuse("drugtr")
m_1 <- coxph(Surv(studytime, died) ~ drug + age, drugtr)
summary(m_1)
SE for 'drug' is .0477017 in Stata (SE for HR), but 0.46052 in R (SE for coefficient).
To get the same (or similar) SE values as Stata, you could borrow some functions from R packages. I used deltaMethod
in car
package.:
library(car)
deltaMethod(m_1, "exp(drug)")
Now it is 0.044426, similar to the value using Stata. There may be other alternatives.