rintegrationt-test

Integrating the PDF of a t-distribution in R via integrate()


I am trying to replicate the t-test with mathematical formulas and it somehow wont work with may current attempt below:

# PDF t-distribution:
t_distr = function(x,df){
  t_1 = gamma((df+1)/2)/(sqrt(df*pi)*gamma(df/2))
  t_2 = (1 + (seq^2/df))^(-(df+1)/2)
  t_distr = t_1*t_2
  return(t_distr)
} # End of function

# Initialize Sequence: 
seq = seq(-4,4,by = .1)

# Integrating up to t-value of 1 (lower tail t-test):
integrate(t_distr, df = length(seq)-1, lower = -Inf, upper = 1 )

# For df = length(seq)-2 (e.g. for a lin. reg):
integrate(t_distr, df = length(seq)-2, lower = -Inf, upper = 1 )

The above code will result in the following error message:

Error in integrate(t_distr, df = length(seq) - 1, lower = -Inf, upper = 1) : 
  evaluation of function gave a result of wrong length

Any suggestions how to make it possible to integrate the pdf of a t-distribution using the integrate function? It also does not work with a "df" of just n...


Solution

  • A couple of issues:

    1. Your t_distr function uses seq which is not defined (actually it is defined to be function, later you overwrite it with a vector, avoid that). You want your t_distr (the density of the t distribution) to take one argument x:

      t_distr <- function(x, df){
        t_1 <- gamma((df + 1) / 2) / (sqrt(df * pi) * gamma(df / 2))
        t_2 <- (1 + (x ^ 2 / df)) ^ (-(df + 1) / 2)
        t_1 * t_2
      }
      

      (you can check that your t density is correctly defined by all.equal(t_distr(2.123, 12), dt(2.123, 12)).

    2. For integrate (to move from the desnity to the cdf) you need to provide your function and limits. The function will then be integrated in its first argument:

      integrate(t_distr, df = 12, lower = -Inf, upper = 1)
      

      this will integrate your density from -Inf to 1 with 12 degrees of freedom. Again we can check whether we got the correct results by comparing that to the built-in cdf of the t-distribution:

      all.equal(integrate(t_distr, df = 12, lower = -Inf, upper = 1)$value, pt(1, 12))
      
    3. The degrees of freedom have actually nothing to do with the integration itself and should be seen as an additional parameter. Maybe that's were the confusion came from.