rclassificationldalinear-discriminant

Scale LDA decision boundary


I have a rather unconventional problem and having a hard time finding a solution to this. Would really appreciate your help.

I have 4 genes(features) and my classification here is binary(0 and 1). After a lot of back and forth, I have finalized on using LDA to do my classification. I have different studies each comparing the same two classes and I trained my model using these 4 genes on each of these studies.

I want to visualize the LDA scores in the form of points plot. Something like below, where each section represents a different study/dataset. Samples of that dataset on the X axis and the LD1 value I get using - lda_model = lda(formula = class ~ ., data = train) predict(lda_model,train) on the Y axis.

enter image description here Since I trained a different model on each dataset, we can clearly see the the decision boundary (which I assume is the black line) for each dataset is different and on a different scale. However, I want to scale the values on the Y axis is such a way that all my datasets are on the same scale and I can represent this plot with a single decision boundary( again, something I can clearly draw on the plot, like the red line).

The LD1 values here are - a(GeneA) + b(GeneB) + c(GeneC) + d(GeneD) - mean(a(GeneA) + b(GeneB) + c(GeneC) + d(GeneD)). This is done for each dataset individually. However, this is not exactly equal to (a(GeneA) + b(GeneB) + c(GeneC) + d(GeneD) + intercept) which we can get using logistic regression. I am trying to find that value or some method which can scale my Y axis across all the datasets using LDA.

Thanks for your help!


Solution

  • I did a min-max scaling and that seemed to work. It scaled all my data points across all datasets with decision boundary at zero.