I am running into problems post-hoc testing (package 'emmeans', functions 'emmeans'/'contrast') a survival model (package 'survival', function 'survreg') I've previously fitted to some experimental data. This experiment treated alfalfa plants of 4 'generations' with bacteria or salinity. There are 6 replicates for every possible treatment combination. Missing values stem from lost samples.
Alfalfa generation: G1, G2, G3, 'Halo'
Salinity: 0, 8
Bacteria: C, CN, E, H, H+E
Data subset (only showing one bacterial treatment):
Bac Salinity Generation PRC MPR
E 0 G1 0 6.761500689
E 8 G1 1 N/A
E 0 G2 0 6.761500689
E 8 G2 0 6.761500689
E 0 G3 1 6.67360118
E 8 G3 0 6.761500689
E 0 Halo 0 6.761500689
E 8 Halo 0 6.761500689
T8PSR is my data,'MPR' is proline content, "PRC" is a 0=censored/1=uncensored identifier.The response variable ("MPR" i.e., proline content) is right-censored for many samples (especially treatment Bacteria='E', Bacteria='H+E', and Salinity='8'), as the spectrophotometer I used couldn't give absorbance values >3 (which translated to proline concentrations/'MPR' values >6.76).
Code:
s2<-Surv(T8PSR$"MPR",T8PSR$"PRC", type="right")
P2<-survreg(s2~Bac*Generation*Salinity, data=T8PSR)
pa2<-anova(P2)
Prr<-emmeans(P2, ~Bac)
contrast(Prr, method="pairwise")
Anova
Df Deviance Resid. Df -2*LL Pr(>Chi)
NULL 183 661.90
Bac 4 27.9079 179 634.00 <0.01
Generation 3 1.1563 176 632.84 0.76
Salinity 1 27.8662 175 604.97 <0.01
Bac:Generation 12 3.9519 163 601.02 0.98
Bac:Salinity 4 7.4869 159 593.53 0.11
Generation:Salinity 3 2.9534 156 590.58 0.39
Bac:Generation:Salinity 12 4.2484 144 586.33 0.97
Survreg/emmeans gave satisfactory results for shoot proline content, however the confidence interval estimates in emmeans for bacterial root proline content (shown) and salinity (not shown) were huge in certain treatments (I believe this is because there is more censored data in roots than in shoots):
Bac emmean SE df lower.CL upper.CL
C 1.488 0.239 144 1.015 1.96
C+N 1.517 0.256 144 1.012 2.02
E 4.979 763.340 144 -1503.819 1513.78
H 0.688 0.218 144 0.257 1.12
H+E 4.786 763.340 144 -1504.012 1513.58
My problem is that: contrasts between bacterial treatments did not find significant differences despite 'E' and 'H+E' having much higher means & being more right-censored (therefore having obviously higher proline content) than other treatments.
contrast estimate SE df t.ratio p.value
C - (C+N) -0.0294 0.350 144 -0.084 1.0000
C - E -3.4913 763.340 144 -0.005 1.0000
C - H 0.8001 0.324 144 2.471 0.1031
C - (H+E) -3.2981 763.340 144 -0.004 1.0000
(C+N) - E -3.4619 763.340 144 -0.005 1.0000
(C+N) - H 0.8295 0.336 144 2.469 0.1035
(C+N) - (H+E) -3.2687 763.340 144 -0.004 1.0000
E - H 4.2914 763.340 144 0.006 1.0000
E - (H+E) 0.1932 1079.525 144 0.000 1.0000
H - (H+E) -4.0982 763.340 144 -0.005 1.0000
How do I get 'contrast' to show me the obviously significant result I know is here?
I've looked through stack overflow without being able to find a similar problem (however please direct me to one if I missed it). I've looked through the vignettes for 'emmeans' and instructions for 'survreg' but haven't found any material addressing this issue either.
Consolidating the comment discussion:
survreg::survreg
. This is a plausible default for survival times (usually skewing heavily to the right, Weibull is a generalization of the exponential distribution), but not for your data. Switching to a (log)normal likelihood improved fit significantly.Something still to do is to actually confirm model goodness of fit, this being a parametric censored model they tend to be even more sensitive to assumptions (homoscedasticity) than their non-censored counterparts. Inspect your residuals or predicted vs. observed outcomes!