sqlazureazure-databricks

Unable to match the results when converting the Excel Trend formula to SQL


We have a requirement to convert Excel Trend logic to SQL using Azure SQL Editor, but the SQL results do not match. When I used the below logic getting null for Trend, also i have tried the multiple conditions(using slope & intercept,Linear_Regression) but results are not matched. If anyone has an idea, please respond to this.

Data [enter image description here] (https://i.sstatic.net/xtepxhiI.png)

Quantiy_in_KG 76.1,11.4,3.6,1.1

Customer_Achieved_avg_price_in_kg 651.9,652.67,666.94,1050.18

Trend_to_sql 682.91,742.07,749.2,751.48

Example logic1:

WITH RegressionStats AS (
SELECT
CORR(quantity_in_kg, Customer_achieved_price) / VAR(quantity_in_kg) AS slope,
AVG(Customer_achieved_price) - slope \* AVG(quantity_in_kg) AS intercept
FROM
your_table_name
)

-- Calculate predicted values

SELECT
quantity_in_kg,
Customer_achieved_price,
slope \* quantity_in_kg + intercept AS predicted_price
FROM
your_table_name
CROSS JOIN RegressionStats;

Example Logic 2nd

Select
quantity_in_kg,
Customer_achieved_price,
(Slope \* quantity_in_kg + INTERCEPT) as Trend
From
(SELECT
quantity_in_kg,
Customer_achieved_price,
REGR_SLOPE(Customer_achieved_price,quantity_in_kg) OVER() AS SLOPE,
REGR_INTERCEPTCustomer_achieved_price,quantity_in_kg) OVER() AS INTERCEPT
From data
)

Expected Results enter image description here

Expecting assistance from the Stack Community to move forward.


Solution

  • I have used your data and created the trend in excel and in databricks sql, both results are matching.

    Below is the excel trend results.

    enter image description here

    and in databricks using below code.

    SELECT
    regr_slope(Customer_Achieved_avg_price_in_kg,Quantity_in_KG) OVER () as slope,
    regr_intercept(Customer_Achieved_avg_price_in_kg,Quantity_in_KG) OVER () as intercept,
    Quantity_in_KG,
    Customer_Achieved_avg_price_in_kg,
    slope * Quantity_in_KG + intercept AS trend
    FROM
    your_table_name
    

    Output:

    enter image description here

    Here, both the results are matching. You are getting different results, because either you are not using the TREND function correctly or testing on different values of Quantity_in_KG, please check it properly.