mysqlcorrelated-subqueryderived-table

Unable to Aggregate Results From Query Containing Correlated SubQuery without Timeout - mysql 5.7


My initial query below runs fairly quickly around 9s and returns about 250K records. I want to group the results by month_year with a count of IDs, however, any attempt to group the results by a date, or to wrap the query as a derived table and query again, results in a timeout after about 6 minutes. I think it's possible that the correlated subquery is re-running. I am not able to use CTEs in mysql 5.7 and am unable to create a temporary table with the initial results.

Initial Query - Runs Fine

select
        valid_sellers.uid
        ,(select min(si.created) as 'temp_name' 
            from sell_item si where si.uid = valid_sellers.uid and si.status in ("For Sale", "Sold")  
            having temp_name >= "2023-01-01" and temp_name <= "2023-12-31") as 'first_sales_activity'
from 
    (
        select
            u.uid
        from
            user u
        where
            u.uid in (
                    select
                        t1.uid
                     from
                        (select distinct si.uid from sell_item si) t1)
            and
            u.since < "2023-04-24"
            and
            u.can_sell = "Y"
            and
            u.last_alive_time >= "2023-01-01"
    ) valid_sellers 

Attempt to Aggregate Leads To Timeout

select
        count(valid_sellers.uid)
        ,DATE_FORMAT((select min(si.created) as 'temp_name' 
            from sell_item si where si.uid = valid_sellers.uid and si.status in ("For Sale", "Sold")  
            having temp_name >= "2023-01-01" and temp_name <= "2023-12-31"), '%Y-%m') as 'year_month'
from 
    (
        select
            u.uid
        from
            user u
        where
            u.uid in (
                    select
                        t1.uid
                     from
                        (select distinct si.uid from sell_item si) t1)
            and
            u.since < "2023-04-24"
            and
            u.can_sell = "Y"
            and
            u.last_alive_time >= "2023-01-01"
    ) valid_sellers

group by
    2
    

Second Attempt To Aggregate Causes Timeout

select
    count(first_listings.uid)
    ,DATE_FORMAT(first_listings.first_sales_activity, '%Y-%m') as 'first_listing'
    
from (
        select
                valid_sellers.uid
                ,(select min(si.created) as 'temp_name' 
                    from sell_item si where si.uid = valid_sellers.uid and si.status in ("For Sale", "Sold")  
                    having temp_name >= "2023-01-01" and temp_name <= "2023-12-31") as 'first_sales_activity'
        from 
            (
                select
                    u.uid
                from
                    user u
                where
                    u.uid in (
                            select
                                t1.uid
                             from
                                (select distinct si.uid from sell_item si) t1)
                    and
                    u.since < "2023-04-24"
                    and
                    u.can_sell = "Y"
                    and
                    u.last_alive_time >= "2023-01-01"
            ) valid_sellers
    
    ) first_listings 

group by
    2

Solution

  • I don't think MySQL implements correlated subqueries well. Do it as a JOIN instead.

    Also, it's really bad at optimizing WHERE uid IN (subquery) -- that should almost always be a join.

    WHen you join with the subquery, it will filter out users who aren't returned by that subquery, so you don't even need the WHERE IN check.

    WITH d AS (
        SELECT uid, MIN(created) AS first_sales_activity
        FROM sell_item
        WHERE status IN ("For Sale", "Sold") AND created BETWEEN '2023-01-01' AND '2023-12-31'
        GROUP BY uid
    )
    
    SELECT u.uid, DATE_FORMAT(d.first_sales_activity, '%Y-%m') AS first_sales_activity
    FROM users AS u
    JOIN d ON u.uid = d.uid
    WHERE u.since < '2023-04-014' AND u.can_sell = 'Y' AND u.last_alive_time >= "2023-01-01"