t-sqlsubquery

Delete (or Select) data with subquery using T-SQL


I am trying to come up with T-SQL query to delete data based on these conditions:

  1. Find data that have same Email.

  2. If that user has both AllLocations 'Yes' or 'No', delete entry/entries that are not today's (assuming today is 4/9/2025).

     UserName    Email           AllLocations    Location    DateLoad
     John Doe    jdoe@gmail.com  No              Cerritos    4/9/2025
     John Doe    jdoe@gmail.com  No              Cerritos    4/9/2025
     John Doe    jdoe@gmail.com  Yes                         2/21/2025
     Jane Sue    jsue@yahoo.com  No              Los Angeles 3/31/2025
    

So, expected outcome is bottom (after deletion):

UserName    Email           AllLocations    Location    DateLoad
John Doe    jdoe@gmail.com  No              Cerritos    4/9/2025
John Doe    jdoe@gmail.com  No              Cerritos    4/9/2025
Jane Sue    jsue@yahoo.com  No              Los Angeles 3/31/2025

I tried something like bottom, but obviously, it is wrong:

 delete 
 from [table] a
 inner join [table] b
 on a.Email = b.Email
 where 
    (
    a.AllLocations = 'Yes'
    and 
    b.AllLocations = 'No'
    )
   or 
    (
    b.AllLocations = 'Yes'
    and 
    a.AllLocations = 'No'
    )
   and 
   (
        cast( a.DateLoad as date) = cast( GETDATE() as date) 
        or 
        cast( b.DateLoad as date) = cast( GETDATE() as date) 
   )

Solution

  • for your condition Find data that have same Email , you can use count() with window function

    count(*) over (partition by Email)
    

    For condition : If that user has both AllLocations 'Yes' or 'No'

    Again use window function to get min() and max() value and compare

    min(AllLocations) over (partition by Email)
    <>
    max(AllLocations) over (partition by Email)
    

    And your last condition is simply check for DateLoad

    with 
    cte as
    (
        select *, 
               email_cnt = count(*) over (partition by Email),
               both_location = case when min(AllLocations) over (partition by Email)
                                    <>   max(AllLocations) over (partition by Email)
                                    then 1
                                    else 0
                                    end
        from   [table]
    )
    delete c
    from   cte c
    where  email_cnt > 1
    and    both_location = 1
    and    DateLoad <> '20250409'