I am trying to get the standard errors of the betas in a linear regression in bigquery ML, sorry if I have missed something basic, but I cannot find the answer to this question
#standard sql
CREATE OR REPLACE MODEL `DATASET.test_lm`
OPTIONS(model_type='LINEAR_REG', input_label_cols= ["y"]) AS
select * from unnest(ARRAY<STRUCT<y INT64, x float64>> [(1,2.028373),
(2,2.347660),(3,3.429958),(4,5.250539),(5,5.976455)])
you can get weights without variance with
select * from ml.weights(model `DATASET.test_ml`)
Also, you can calculate the standard errors directly like this
with dat as (
select * from unnest(ARRAY<STRUCT<y INT64, x float64>> [(1,2.028373), (2,2.347660),(3,3.429958),(4,5.250539),(5,5.976455)])),
#get the residual standard error, using simple df-2
rse_dat as (
select sqrt(sum(e2)/((select count(1) from dat)-2)) as rse from (
select pow(y - predicted_y, 2) as e2 from ml.predict(model `DATASET.test_lm`,
(select * from dat)))),
#get the variance of x
xvar_dat as (
select sum(pow(x - (select avg(x) as xbar from dat),2)) as xvar from dat)
#calulate standard error
select sqrt((select pow(rse,2) from rse_dat)/(select xvar from xvar_dat) as beta_x_se )
But this gets to be heavy lift for many covariates. Is there a direct way to get this get this pretty basic statistic for confidence intervals?
You could use ML.ADVANCED_WEIGHTS now, which gives standard errors.
https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-advanced-weights