rfunction-fittingcumulative-distribution-function

Fit a smoothed/cumulative distribution function to data and predict x from given y


I need to fit a smoothed/cumulative distribution function to my data and afterwards be able to predict the x-value by a given y, this is what my code looks like atm but it doesn´t work as expected, because loess probably isn´t the right method (even goes below y<0) and the prediction doesn´t seem to work, too. Any help would be highly appreciated!

test<-data.frame("xvar"=c(0.01,0.86,2,6.3,20),"yvar"=c(0.14,0.16,5.16,89.77,100))


(testplot <- ggplot(test,aes(x=xvar,y=yvar)) + 
    geom_point(lwd=1) +
    geom_line(col="red") +      
    geom_smooth(method = "loess") +    
    scale_x_continuous(trans='log10') +
    xlab("X") + 
    ylab("Y") +
    labs(title="Test"))

testf<-stats::loess(yvar~xvar, data = test)
predict(testf, 10)

Solution

  • Just eye-balling, but it looks like your data follows a logistic(ish) function. What about this:

    library(tidyverse)
    
    test<-data.frame("xvar"=c(0.01,0.86,2,6.3,20),"yvar"=c(0.14,0.16,5.16,89.77,100))
    
    fit <- nls(yvar ~ SSlogis(xvar, Asym, xmid, scal), data = test)
    new_dat <- tibble(xvar = seq(0.01, 20, by = 0.01))
    new_dat$yvar <- predict(fit, new_dat)
    
    test |>
      ggplot(aes(xvar, yvar))+
      geom_point()+
      geom_line(data = new_dat)
    

    predict(fit, tibble(xvar = 10))[1]
    #> [1] 99.83301
    

    EDIT:

    I see that you want to then calculate X given a Y:

    calc_x <- function(y, model){
      cfs <- summary(model)$coefficients
      -1*((log((cfs[1,1]/y)-1)*cfs[3,1])-cfs[2,1])
    }
    
    calc_x(y = 10, model = fit)
    #> [1] 2.666598
    
    #test
    predict(fit, tibble(xvar = 2.666598))[1]
    #> [1] 9.999995