I have a rather unconventional problem and having a hard time finding a solution to this. Would really appreciate your help.
I have 4 genes(features) and my classification here is binary(0 and 1). After a lot of back and forth, I have finalized on using LDA to do my classification. I have different studies each comparing the same two classes and I trained my model using these 4 genes on each of these studies.
I want to visualize the LDA scores in the form of points plot. Something like below, where each section represents a different study/dataset. Samples of that dataset on the X axis and the LD1 value I get using -
lda_model = lda(formula = class ~ ., data = train)
predict(lda_model,train)
on the Y axis.
Since I trained a different model on each dataset, we can clearly see the the decision boundary (which I assume is the black line) for each dataset is different and on a different scale. However, I want to scale the values on the Y axis is such a way that all my datasets are on the same scale and I can represent this plot with a single decision boundary( again, something I can clearly draw on the plot, like the red line).
The LD1 values here are - a(GeneA) + b(GeneB) + c(GeneC) + d(GeneD) - mean(a(GeneA) + b(GeneB) + c(GeneC) + d(GeneD)). This is done for each dataset individually. However, this is not exactly equal to (a(GeneA) + b(GeneB) + c(GeneC) + d(GeneD) + intercept) which we can get using logistic regression. I am trying to find that value or some method which can scale my Y axis across all the datasets using LDA.
Thanks for your help!
I did a min-max scaling and that seemed to work. It scaled all my data points across all datasets with decision boundary at zero.