I'm trying to add the r-squared to my scatterplot. I also have a lowess trendline. This is my code so far (I'm also attaching a picture):
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
color_map = {'PORTUGAL': 'green', 'FRANCIA': 'royalblue', 'REINO UNIDO': 'lightgray', 'BELGICA': 'red'}
fig = px.scatter(df_datos, x="EAD", y="RAR", size="Margen_bruto", color="País", color_discrete_map=color_map, hover_name="Cliente", hover_data={'País':True, 'EAD':':.0f', 'RAR':':.0%', 'Margen_bruto':':.0f'}, size_max=30, trendline='lowess', trendline_color_override='black', trendline_scope='overall', title="dic 2021 - 219 clientes")
fig.add_hline(y=0.13, line_dash="solid", line_color="orange", annotation_text="13% RAR", annotation_position="bottom right", annotation_font_color="orange")
fig.add_vline(x=33452378, line_dash="dash", line_color="orange", annotation_text="50% EAD", annotation_position="top right", annotation_font_color="orange")
fig.add_vline(x=97227035, line_dash="dot", line_color="orange", annotation_text="80% EAD", annotation_position="top right", annotation_font_color="orange")
In case I can't find a function, something that also works for me is putting a label in the lowess trendline, so that it appears at the bottom right, where the line ends. This is because I can calculate the number separately and just plug it in there. I'm sure this should be easier.
Plotly doesn't calculate r-squared values for lowess or other non-parametric trendlines (like rolling average).
I think it's important to understand how lowess
is meant to be used. Since lowess
is non-parametric (it doesn't assume that the data behaves according to a mathematical model), there isn't an explicit mathematical formula (see here).
I would think about lowess like a more sophisticated rolling average – a rolling average just shows you local patterns and trends in the data. If you try to calculate r-squared for a rolling average function, it doesn't tell you anything meaningful – it can be high or low and won't necessarily fall between 0 and 1.
On the other hand, when you use an ordinary least squares trendline, this assumes that the data assumes according to an explicit mathematical formula. In that case, the r-squared value shows you how well variation from between the actual data points and predictions is explained by the ordinary least squares model.
This is probably the reason why plotly calculates r-squared values for ordinary least squares, but not lowess or rolling average.