rsurvival-analysisemmeans

Using/post-hoc testing 'survreg' with 'emmeans' in r when certain experimental treatments are 75% censored


I am running into problems post-hoc testing (package 'emmeans', functions 'emmeans'/'contrast') a survival model (package 'survival', function 'survreg') I've previously fitted to some experimental data. This experiment treated alfalfa plants of 4 'generations' with bacteria or salinity. There are 6 replicates for every possible treatment combination. Missing values stem from lost samples.

Alfalfa generation: G1, G2, G3, 'Halo'

Salinity: 0, 8

Bacteria: C, CN, E, H, H+E

Data subset (only showing one bacterial treatment):

Bac Salinity    Generation  PRC MPR
E   0                 G1    0   6.761500689
E   8                 G1    1   N/A
E   0                 G2    0   6.761500689
E   8                 G2    0   6.761500689
E   0                 G3    1   6.67360118
E   8                 G3    0   6.761500689
E   0                Halo   0   6.761500689
E   8                Halo   0   6.761500689

T8PSR is my data,'MPR' is proline content, "PRC" is a 0=censored/1=uncensored identifier.The response variable ("MPR" i.e., proline content) is right-censored for many samples (especially treatment Bacteria='E', Bacteria='H+E', and Salinity='8'), as the spectrophotometer I used couldn't give absorbance values >3 (which translated to proline concentrations/'MPR' values >6.76).

Code:

s2<-Surv(T8PSR$"MPR",T8PSR$"PRC", type="right")
P2<-survreg(s2~Bac*Generation*Salinity, data=T8PSR)
pa2<-anova(P2)
Prr<-emmeans(P2, ~Bac)
contrast(Prr, method="pairwise")

Anova

                        Df Deviance Resid. Df  -2*LL Pr(>Chi)    
NULL                                      183 661.90             
Bac                      4  27.9079       179 634.00    <0.01 
Generation               3   1.1563       176 632.84    0.76   
Salinity                 1  27.8662       175 604.97    <0.01 
Bac:Generation          12   3.9519       163 601.02    0.98    
Bac:Salinity             4   7.4869       159 593.53    0.11    
Generation:Salinity      3   2.9534       156 590.58    0.39    
Bac:Generation:Salinity 12   4.2484       144 586.33    0.97    

Survreg/emmeans gave satisfactory results for shoot proline content, however the confidence interval estimates in emmeans for bacterial root proline content (shown) and salinity (not shown) were huge in certain treatments (I believe this is because there is more censored data in roots than in shoots):

 Bac emmean      SE  df  lower.CL upper.CL
 C    1.488   0.239 144     1.015     1.96
 C+N  1.517   0.256 144     1.012     2.02
 E    4.979 763.340 144 -1503.819  1513.78
 H    0.688   0.218 144     0.257     1.12
 H+E  4.786 763.340 144 -1504.012  1513.58

My problem is that: contrasts between bacterial treatments did not find significant differences despite 'E' and 'H+E' having much higher means & being more right-censored (therefore having obviously higher proline content) than other treatments.

 contrast      estimate       SE  df t.ratio p.value
 C - (C+N)      -0.0294    0.350 144  -0.084  1.0000
 C - E          -3.4913  763.340 144  -0.005  1.0000
 C - H           0.8001    0.324 144   2.471  0.1031
 C - (H+E)      -3.2981  763.340 144  -0.004  1.0000
 (C+N) - E      -3.4619  763.340 144  -0.005  1.0000
 (C+N) - H       0.8295    0.336 144   2.469  0.1035
 (C+N) - (H+E)  -3.2687  763.340 144  -0.004  1.0000
 E - H           4.2914  763.340 144   0.006  1.0000
 E - (H+E)       0.1932 1079.525 144   0.000  1.0000
 H - (H+E)      -4.0982  763.340 144  -0.005  1.0000

How do I get 'contrast' to show me the obviously significant result I know is here?

I've looked through stack overflow without being able to find a similar problem (however please direct me to one if I missed it). I've looked through the vignettes for 'emmeans' and instructions for 'survreg' but haven't found any material addressing this issue either.


Solution

  • Consolidating the comment discussion:

    1. You are fitting a log-linked Weibull likelihood, the default in survreg::survreg. This is a plausible default for survival times (usually skewing heavily to the right, Weibull is a generalization of the exponential distribution), but not for your data. Switching to a (log)normal likelihood improved fit significantly.
    2. Your model is quite complex, with a three-way interaction interpretation is not straightforward (though you didn't show parameter estimates themselves) and you increase the risk of overfitting. Focusing on simpler effects - removing some of the higher-order interactions - resulted in a better and more interpretable fit.

    Something still to do is to actually confirm model goodness of fit, this being a parametric censored model they tend to be even more sensitive to assumptions (homoscedasticity) than their non-censored counterparts. Inspect your residuals or predicted vs. observed outcomes!