rquantile-regression

summary.rq output depends on sample size


I found out (see below) that the function summary.rq (page 88) from the quantreg package prints different output, depending on whether sample size is greater equal or less than 1001.

I am aware, that rq() uses a different method depending on whether sample size is greater equal or less than 1001. I assume that this is the reason for this behaviour.

An MWE showing the described behaviour:

> library(quantreg)
> x <- seq(0, 100, length.out = 1000)     
> e <- rnorm(1000, mean = 0, sd = 1.2)
> y <- 6 + 0.1 * x + e  
> summary(rq(y ~ x, tau = 0.025))

Call: rq(formula = y ~ x, tau = 0.025)

tau: [1] 0.025

Coefficients:
            coefficients lower bd upper bd
(Intercept) 3.67733      2.92776  3.88165 
x           0.10061      0.09578  0.10675 
Warning message:
In rq.fit.br(x, y, tau = tau, ci = TRUE, ...) : Solution may be 
nonunique

> x <- seq(0, 100, length.out = 1001)     
> e <- rnorm(1001, mean = 0, sd = 1.2)
> y <- 6 + 0.1 * x + e  
> summary(rq(y ~ x, tau = 0.025))

Call: rq(formula = y ~ x, tau = 0.025)

tau: [1] 0.025

Coefficients:
            Value    Std. Error t value  Pr(>|t|)
(Intercept)  3.61744  0.28052   12.89559  0.00000
x            0.10033  0.00477   21.04017  0.00000

Is this desired behaviour? How can I get the first form of output for sample sizes larger than 1000?

My issue is that I am using the summary.rq function in a loop over multiple sample sizes and would like to use the lower and upper band values.


Solution

  • The difference between the outputs is not coming from rq() but rather summary.rq(). quantreg uses "rank" inference methods for sample sizes smaller than 1000 otherwise "nid" is used. The help file indicates that the "rank" methods can be extremely slow for larger sample sizes. If you insist on having the former output appear for all your looped studies then you can specify

    summary(rq(y ~ x, tau = 0.025),se="rank")

    However, you may be better served by looking into this more closely. For example if some of your studies have very large sample sizes the computation may become extremely slow, you may want to specify se="nid" for all your studies instead and compute the upper and lower bands manually (upper=Value + 1.96*Std. Error and lower=Value - 1.96*Std. Error.