Looking for a workaround, if there is a more robust and faster polr
to fit for High Multi-dimensional data in an ordinal data context. (Similiar to those like lm()
and .lm.fit()
)
Example datasets: https://filebin.net/e1qz05qy9qo6zpwa
library(tictoc)
library(MASS)
custom_data <- read.csv(file.choose())
tic()
polr(LH_info ~ ., data = custom_data[,1:100])
toc() #0.61 seconds
ADDED EDIT: Issues found using current polr
& orm
methods:
Specifically Using this dataset for orm issues: https://filebin.net/hnpbkrw4gc9a5pn9
custom_data2 <- read.csv(file.choose())
custom_data2$OC_info <- factor(OC_custom$OC_info, order = TRUE,
levels=c("Extreme Low Open Close (<-40)","Common Lower Open Close (-40-0)",
"Common Higher Open Close (0-40)","Extreme High Open Close (>40)"))
test_model <- orm(OC_info ~ ., data = custom_data2[,1:101])
test_model2 <- orm(OC_info ~ ., data = custom_data2[,1:102])
Specifically Using this dataset for polr issues: https://filebin.net/hg7irb8al8pfs9sd
custom_data3 <- read.csv(file.choose())
custom_data3$OC_info <- factor(OC_custom$OC_info, order = TRUE,
levels=c("Extreme Low Open Close (<-40)","Common Lower Open Close (-40-0)",
"Common Higher Open Close (0-40)","Extreme High Open Close (>40)"))
test_model3 <- polr(OC_info ~ ., data = custom_data3)
polr: Error in optim(s0, fmin, gmin, method = "BFGS", ...) initial value in 'vmmin' is not finite
--> Happens sometimes with some independent variables combination
orm: Error in .local(x, ...) : Increase tmpmax
--> this always happen when try to model the dataset with more or equal of 100 independent variables
You could use the clm
function from the ordinal
package or the orm
function of the rms
package to fit an ordinal regression. In both you could use *.fit
options. Since you want to check the speed, here is a benchmark:
library(microbenchmark)
library(MASS)
library(ordinal)
library(rms)
set.seed(7)
custom_data <- read.csv("dataset_example.csv")
custom_data$LH_info <- as.factor(custom_data$LH_info)
custom_data$LH_info <- as.factor(custom_data$LH_info)
m = microbenchmark(
"polr" = {
polr(LH_info ~ ., data = custom_data[,1:100])
},
"clm" = {
clm(LH_info ~ ., data = custom_data[,1:100])
},
"orm" = {
orm(LH_info ~ ., data = custom_data[,1:100])
}, times = 100
)
m
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> polr 174.6823 183.0839 194.1672 188.6606 195.7334 327.6748 100 a
#> clm 340.8700 354.7288 365.2914 360.8585 366.6671 485.0190 100 c
#> orm 251.0034 261.5099 276.0913 266.3175 273.9440 405.5983 100 b
library(ggplot2)
autoplot(m)
Created on 2023-02-03 with reprex v2.0.2
Your polr
option is already pretty fast.
More information about both functions:
ordinal
package: Cumulative Link Models for Ordinal Regression with the R package ordinalorm
function (Ordinal Regression Model)