rregressionordinal

More robust and faster polr for ordinal regression


Looking for a workaround, if there is a more robust and faster polr to fit for High Multi-dimensional data in an ordinal data context. (Similiar to those like lm() and .lm.fit())

Example datasets: https://filebin.net/e1qz05qy9qo6zpwa

library(tictoc)
library(MASS)
custom_data <- read.csv(file.choose())
tic()
polr(LH_info ~ ., data = custom_data[,1:100])
toc() #0.61 seconds

ADDED EDIT: Issues found using current polr & orm methods:

Specifically Using this dataset for orm issues: https://filebin.net/hnpbkrw4gc9a5pn9

custom_data2 <- read.csv(file.choose())
custom_data2$OC_info <- factor(OC_custom$OC_info, order = TRUE,
                            levels=c("Extreme Low Open Close (<-40)","Common Lower Open Close (-40-0)", 
                                     "Common Higher Open Close (0-40)","Extreme High Open Close (>40)"))
test_model <- orm(OC_info ~ ., data = custom_data2[,1:101])
test_model2 <- orm(OC_info ~ ., data = custom_data2[,1:102])

Specifically Using this dataset for polr issues: https://filebin.net/hg7irb8al8pfs9sd

custom_data3 <- read.csv(file.choose())
custom_data3$OC_info <- factor(OC_custom$OC_info, order = TRUE,
                            levels=c("Extreme Low Open Close (<-40)","Common Lower Open Close (-40-0)", 
                                     "Common Higher Open Close (0-40)","Extreme High Open Close (>40)"))
test_model3 <- polr(OC_info ~ ., data = custom_data3)
  1. polr: Error in optim(s0, fmin, gmin, method = "BFGS", ...) initial value in 'vmmin' is not finite --> Happens sometimes with some independent variables combination

  2. orm: Error in .local(x, ...) : Increase tmpmax --> this always happen when try to model the dataset with more or equal of 100 independent variables


Solution

  • You could use the clm function from the ordinal package or the orm function of the rms package to fit an ordinal regression. In both you could use *.fit options. Since you want to check the speed, here is a benchmark:

    library(microbenchmark)
    library(MASS)
    library(ordinal)
    library(rms)
    
    set.seed(7)
    custom_data <- read.csv("dataset_example.csv")
    custom_data$LH_info <- as.factor(custom_data$LH_info)
    custom_data$LH_info <- as.factor(custom_data$LH_info)
    
    m = microbenchmark(
      "polr" = {
        polr(LH_info ~ ., data = custom_data[,1:100])
      },
      "clm" = {
        clm(LH_info ~ ., data = custom_data[,1:100])
      }, 
      "orm" = {
        orm(LH_info ~ ., data = custom_data[,1:100])
      }, times = 100
    )
    
    m
    #> Unit: milliseconds
    #>  expr      min       lq     mean   median       uq      max neval cld
    #>  polr 174.6823 183.0839 194.1672 188.6606 195.7334 327.6748   100 a  
    #>   clm 340.8700 354.7288 365.2914 360.8585 366.6671 485.0190   100   c
    #>   orm 251.0034 261.5099 276.0913 266.3175 273.9440 405.5983   100  b
    library(ggplot2)
    autoplot(m)
    

    Created on 2023-02-03 with reprex v2.0.2

    Your polr option is already pretty fast.


    More information about both functions: