sqlsql-servert-sqlleft-join

SQL Server Left Join With 'Or' Operator


I have a four tables, TopLevelParent, two mid level tables MidParentA and MidParentB, and a Child table which can have a parent of MidParentA or MidParentB (One or the other midParent must be in place). Both mid level tables have a parent table of TopLevelParent.

The Top Level table look like this:

TopLevelId | Name
--------------------------
1          | name1   
2          | name2   

The MidParent tables look like this:

MidParentAId | TopLevelParentId |           MidParentBId | TopLevelParentId |
------------------------------------       ------------------------------------
1            |        1         |           1            |        1         |
2            |        1         |           2            |        1         |

The Child table look like this:

ChildId | MidParentAId | MidParentBId
--------------------------------
1       |     1        |   NULL
2       |    NULL      |     2

I have used the following left join in a larger stored procedure which is timing out, and it looks like the OR operator on the last left join is the culprit:

SELECT *    
FROM TopLevelParent tlp
LEFT JOIN MidParentA a ON tlp.TopLevelPatientId = a.TopLevelPatientId
LEFT JOIN MidParentB a ON tlp.TopLevelPatientId = b.TopLevelPatientId
LEFT JOIN Child c ON c.ParentAId = a.ParentAId OR c.ParentBId = b.ParentBId

Is there a more performant way to do this join?


Solution

  • Here is what I did in the end, which got the execution time down from 52 secs to 4 secs.

    SELECT * 
    FROM (
        SELECT tpl.*, a.MidParentAId as 'MidParentId', 1 as 'IsMidParentA' 
        FROM TopLevelParent tpl 
        INNER JOIN MidParentA  a ON a.TopLevelParentId = tpl.TopLevelParentID
    UNION
        SELECT tpl.*, b.MidParentBId as 'MidParentId', 0 as 'IsMidParentA'  
        FROM TopLevelParent tpl 
        INNER JOIN MidParentB b ON b.TopLevelParentId = tpl.TopLevelParentID
    UNION
        SELECT tpl.*, 0 as 'MidParentId', 0 as 'IsMidParentA'  
        FROM TopLevelParent tpl 
        WHERE tpl.TopLevelParentID NOT IN (
           SELECT pa.TopLevelParentID 
           FROM TopLevelParent tpl
           INNER JOIN MidParentA  a ON a.TopLevelParentId = tpl.TopLevelParentID
        UNION
           SELECT pa.TopLevelParentID 
           FROM TopLevelParent tpl
           INNER JOIN MidParentB b ON h.TopLevelParentId = tpl.TopLevelParentID
        )
    ) tpl
    LEFT JOIN MidParentA a ON a.TopLevelParentId = tpl.TopLevelParentID
    LEFT JOIN MidParentB b ON b.TopLevelParentId = tpl.TopLevelParentID
    LEFT JOIN 
    (
            SELECT  [ChildId]
                    ,[MidParentAId] as 'MidParentId'
                    ,1 as 'IsMidParentA'
            FROM Child c
            WHERE c.MidParentAId IS NOT NULL
       UNION
            SELECT [ChildId]
                   ,[MidParentBId] as 'MidParentId'
                   ,0 as 'IsMidParentA'
            FROM Child c
            WHERE c.MidParentBId IS NOT NULL
    ) AS c
    ON c.MidParentId = tpl.MidParentId  AND c.IsMidParentA = tpl.IsMidParentA
    

    This eliminates the table scanning that was happening, as I have matched the top level record to its midlevel parent up front if it exists, and stamped it on that record.

    I have also done the same with the child record meaning I can then just join the child record to the top level record on the MidParentId, and I use the IsMidParentA bit flag to differentiate where there are two identical MidParentIds (ie an Id of 1 for IsMidParentA and IsMidParentB).

    Thanks to all who took the time to answer.