mysqlgroupwise-maximum

Query (HAVING Clause) does not behave the same between Mysql Version 5.7 and 8.0


So recently I was asked to upgrade Mysql Version from 5.7 to Mysql 8. After the upgrade, I noticed a strange behavior where the results produced is completely different in both version. Below are the structure and sample data.

Create Table

DROP TABLE IF EXISTS `Schema`.`Table`;
CREATE TABLE  `Schema`.`Table` (
  `ExchNo` int NOT NULL,
  `StkID` int NOT NULL,
  `Date` date NOT NULL DEFAULT '1899-12-31',
  `Price` decimal(18,10) NOT NULL DEFAULT '0.0000000000',
  PRIMARY KEY (`ExchNo`,`StkID`,`Date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;

Insert Data

INSERT INTO `Schema`.`Table` (`ExchNo`,`StkID`,`Date`,`Price`) 
VALUES ('1','1','2014-12-16 00:00:00','0.3200000000'),('1','1','2014-12-17 00:00:00','0.3200000000'),('1','1','2014-12-18 00:00:00','0.3350000000'),('1','1','2014-12-19 00:00:00','0.3400000000'),('1','1','2014-12-22 00:00:00','0.4200000000'),('1','1','2014-12-23 00:00:00','0.4100000000'),('1','1','2014-12-24 00:00:00','0.4250000000'),('1','1','2014-12-26 00:00:00','0.4000000000'),('1','1','2014-12-29 00:00:00','0.4150000000'),('1','1','2014-12-30 00:00:00','0.4150000000'),('1','1','2014-12-31 00:00:00','0.4250000000'),('1','6','2014-12-16 00:00:00','9.5500000000'),('1','6','2014-12-17 00:00:00','9.4300000000'),('1','6','2014-12-18 00:00:00','9.6200000000'),('1','6','2014-12-19 00:00:00','9.7500000000'),('1','6','2014-12-22 00:00:00','9.6300000000'),('1','6','2014-12-23 00:00:00','9.8500000000'),('1','6','2014-12-24 00:00:00','9.8900000000'),('1','6','2014-12-26 00:00:00','10.0000000000'),('1','6','2014-12-29 00:00:00','10.0800000000'),('1','6','2014-12-30 00:00:00','10.1600000000'),('1','6','2014-12-31 00:00:00','10.0000000000'),('1','8','2014-12-16 00:00:00','2.6900000000'),('1','8','2014-12-17 00:00:00','2.8100000000'),('1','8','2014-12-18 00:00:00','2.8000000000'),('1','8','2014-12-19 00:00:00','2.9700000000'),('1','8','2014-12-22 00:00:00','2.9400000000'),('1','8','2014-12-23 00:00:00','2.9400000000'),('1','8','2014-12-24 00:00:00','2.9500000000'),('1','8','2014-12-26 00:00:00','2.9700000000'),('1','8','2014-12-29 00:00:00','2.9100000000'),('1','8','2014-12-30 00:00:00','2.9900000000'),('1','8','2014-12-31 00:00:00','2.9000000000');

In Mysql 5.7, using the query Below (using user variables), allows me to obtain the latest Price of each StkID extremely fast considering if the data is large (1mllion rows+)

SELECT  *,(@Count0:= if(@TempID0 = `StkID`, @Count0 +1, 1)) Counter,
            (@TempID0:=`StkID`) Tempid
    FROM  `Schema`.`Table`, 
          ( SELECT  @Count0:=0,@TempID0:=0 ) sqlvar
    WHERE  `Date` <= '2018-12-31 00:00:00'
    having  Counter<=1
    ORDER BY  `StkID`,`Date` DESC;

Output:

ExchNo StkID Date Price @Count0:=0 @TempID0:=0 Counter Tempid
1 1 2014-12-31 0.42500 0 0 1 1
1 6 2014-12-31 10.0000 0 0 1 6
1 8 2014-12-31 2.90000 0 0 1 8

When attempting the same query on Mysql Version 8.0. It just returns the entire table without considering the Having Clause of Counter <= 1.

I could Wrap the entire query into a subquery and get the desired results but it does affect the performance substantially.

Select  *
    From  
        ( SELECT  *,
                  (@Count0:= if(@TempID0 = `StkID`, @Count0 +1, 1)) Counter,
                  (@TempID0:=`StkID`) Tempid
            FROM  `Schema`.`Table`, 
                  ( SELECT  @Count0:=0,@TempID0:=0 ) sqlvar
            WHERE  `Date` <= '2018-12-31 00:00:00'
            ORDER BY  `StkID`,`Date` DESC
        ) a
    Where  Counter<=1 

So my question is, is this a bug? If it isn't, is there a way I could efficiently achieve the results without compromising the performance? Also a warning message appeared and I am not sure if that has got anything to do with the outcome.

Setting user variables within expressions is deprecated and will be removed in a future release. Consider alternatives: 'SET variable=expression, ...', or 'SELECT expression(s) INTO variables(s)'.

I hope someone could could shed some light to my problem, thank you in advance!


Solution

  • I believe that session/user variables have been deprecated as of MySQL 8+. But, you don't really need them anymore as MySQL 8+ has introduced analytic functions. You may use ROW_NUMBER here:

    WITH cte AS (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY StkID ORDER BY Date DESC) rn
        FROM yourTable
        WHERE Date <= '2018-12-31'
    )
    
    SELECT *
    FROM cte
    WHERE rn = 1
    ORDER BY stkID;