I am playing around with linear discriminant analysis and trying to plot the data in plotly in R and then plot the discriminating plane. When I try this I can get the data and the plane, but the plane does not match with the the 3D plotted data. I get the data "floating" above the plane rather than the plane cutting through the data. Using the coefficients the lda function gives us with parameters SB.lda, OPS.lda, and tSho.lda I try to solve for z under the assumption that the equation is SBx + OPSy + tShoz = 0 because the lda function does not give us a constant.
When I reference this post (https://stats.stackexchange.com/questions/166942/why-are-discriminant-analysis-results-in-r-lda-and-spss-different-constant-t) and use the method to generate the constant I get a constant of -13.82201 which when plotted does not match with the data.
What am I missing?
library(plotly)
library(MASS)
train <- data.frame(read.table("https://raw.githubusercontent.com/gunsnfloyd/algorithmdata/main/train.tsv",header=TRUE))
head(train)
model <- lda(WS~ SB + OPS + tSho ,data = train)
SB.lda <- 0.008056699
OPS.lda <- 15.174308879
tSho.lda <- 0.205465030
groupmean<-(model$prior%*%model$means)
constant<-(groupmean%*%model$scaling)
-constant
fig <- plot_ly(train, x = ~SB, y = ~OPS, z = ~tSho, color = ~WS, colors = c('#BF382A', '#0C4B8E'))
fig <- fig %>% add_markers()
fig <- fig %>% layout(scene = list(xaxis = list(title = 'SB'),
yaxis = list(title = 'OPS'),
zaxis = list(title = 'tSho')))
x.lda <- seq(0, 400, length.out = 100) #Range based on SB values
y.lda <- seq(0.5, 0.9, length.out = 100) #Range based on OPS values
z.lda <- outer(x.lda, y.lda, function(x, y) ( (-SB.lda * x - OPS.lda * y) / tSho.lda + 13.82201))
fig <- fig %>% add_surface(x = x.lda, y = y.lda, z = z.lda,colors = "#0C4B8E")
fig
I get a different result from solving the equation:
# general equation: A * x + B * y + C * z + c = 0
# solve for z:
SB * x + OPS * y + tSho * z - 13.82201 = 0
SB * x + OPS * y - 13.82201 = - tSho * z
-SB * x - OPS * y + 13.82201 = tSho * z
(-SB * x - OPS * y + 13.82201) / tSho = z
# calculate values of plane:
x.lda <- seq(0, 400, length.out = 100) #Range based on SB values
y.lda <- seq(0.5, 0.9, length.out = 100) #Range based on OPS values
z.lda <- outer(x.lda, y.lda, function(x, y) (-SB.lda * x - OPS.lda * y + 13.82201) / tSho.lda )
The resulting plot: