rscatter-plotlinear-regressionbest-fit

Why does my linear regression fit line look wrong?


I have plotted a 2-D histogram in a way that I can add to the plot with lines, points etc. Now I seek to apply a linear regression fit at the region of dense points, however my linear regression line seems totally off where it should be? To demonstrate here is my plot on the left with both a lowess regression fit and linear fit.

lines(lowess(na.omit(a),na.omit(b),iter=10),col='gray',lwd=3)

abline(lm(b[cc]~a[cc]),lwd=3)

Here a and b are my values and cc are the points within the densest parts (i.e. most points lay there), red+yellow+blue.

enter image description here

Why doesn't my regression line look more like that on the right (hand-drawn fit)? If I was plotting a line of best fit it would be there?

I have numerous plots similar to this but still I get the same results....

enter image description here

Are there any alternative linear regression fits that could prove to be better for me?


Solution

  • A linear regression is a method to fit a linear function to a set of points (observations) minimizing the least-squares error.

    Now imagine your heatmap indicating a shape where you would assume a vertical line fitting best. Just turn your heatmap 10 degrees counter clock-wise and you have it.

    Now how would a linear function supposed to be defined which is vertical? Exactly, it is not possible.

    The result of this little thought experiment is that you confuse the purpose of linear regression and what you most likely want is - as indicated already by Gavin Simpson - the 1st principal component vector.