sqlpervasivepervasive-sqlactian

What am I doing wrong in this select query with left join and filters?


With Actian PSQL v13:

I'm attempting to join two tables and filter the result set by a few criteria. One table is customer sales history. The other table associates customers to buying groups. Not all customers are in a buying group, so I'm doing a left join to get all possible sales history results. I do NOT want the results to include anyone in buying group 'SALREP', but I DO want to see customers that are not in any buying groups. Also, the so-called-part 'FREIGHT' is being filtered out of the results.

The problem I'm having is that when I filter out the group 'SALREP' only customers that are in a buying group are being selected; all customers not in a buying group are missing.

I've tested this by commenting out the filter for 'SALREP' and the result set does indeed include customers who are not in a buying group. I also tried using a different filter approach, in case Actian PSQL is picky. For example, I tried different methods of "not equals" such as !=, not() and <>; the problem persists. Also, I've used left join and left outer join with the same problem.

Here is the query:

select T2.GROUP_CUST, T1.DATE_INVOICE, T1.SALESPERSON, T1.CUSTOMER, T1.PRODUCT_LINE, T1.PART, T1.DESCRIPTION, T1.QTY_SHIPPED, T1.EXTENSION 
from ORDER_HIST_LINE T1 
left join BUYING_GROUP T2 on T1.CUSTOMER=T2.CUSTOMER 
where DATE_INVOICE > '2019-06-13' and PART != 'FREIGHT'
and T2.GROUP_CUST != 'SALREP' 
and T1.CUSTOMER in ('ABC', 'DEF', 'GHI')
order by T1.CUSTOMER;

The expected result set should include anything invoiced after 2019-06-13, not include a part called 'FREIGHT' and not have customers in the group called 'SALREP'. However, the actual result set is incomplete. For example:

CUSTOMER | GROUP_CUST | DATE_INVOICE  | PART   | etc.
-----------------------------------------------------
ABC      |  A12       |  2019-06-14   | WIDGET
DEF      |  A12       |  2019-06-14   | GEAR

Basically, all customers who are not in any buying group are left out.

Comment out the portion and T2.GROUP_CUST != 'SALREP' and expected results are found. For example:

CUSTOMER | GROUP_CUST | DATE_INVOICE  | PART   | etc.
-----------------------------------------------------
ABC      |  A12       |  2019-06-14   | WIDGET
DEF      |  A12       |  2019-06-14   | GEAR
GHI      |            |  2019-06-15   | WIDGET

I was thinking about creating the left join to a select query that removes 'SALREP' from the buying groups in the first place, but that doesn't allow the result set to identify and remove anyone from that group. Ex.: left join (select * from BUYING_GROUP where GROUP_CUST != 'SALREP') T2

8/7/19 Further Attempts: I'm finding the same dilemma on MySQL v5.0.12. I can left join tables to yield mismatched results. I can filter the left table on those mismatched results without unexpectedly losing anything. Yet, I cannot filter the right table on those mismatched results without having all mismatched rows disappear.


Solution

  • Thank you to https://mode.com/resources/sql-tutorial/sql-joins-where-vs-on/. The article mentions, "filtering in the WHERE clause can also filter null values, so we added an extra line to make sure to include the nulls."

    This led me to testing and realizing that when filtering against the right table, it is necessary to explicitly state that NULL values (unmatched rows) for that filtered column are to be included, otherwise those unmatched rows are dropped from the result set. I also found that it's necessary to put the filter pair in parentheses or the result set "explodes". I tested and found this to be necessary only when filtering against the right table; filtering against the left table doesn't have this issue (obviously).

    Final answer:

    select T2.GROUP_CUST, T1.DATE_INVOICE, T1.SALESPERSON, T1.CUSTOMER, T1.PRODUCT_LINE,             
    T1.PART, T1.DESCRIPTION, T1.QTY_SHIPPED, T1.EXTENSION 
    from ORDER_HIST_LINE T1 
    left join BUYING_GROUP T2 on T1.CUSTOMER=T2.CUSTOMER 
    where DATE_INVOICE > '2019-06-13' and PART != 'FREIGHT' 
    and (T2.GROUP_CUST != 'SALREP' or T2.GROUP_CUST is null) 
    and T1.CUSTOMER in ('ABC', 'DEF', 'GHI') 
    order by T1.CUSTOMER;
    

    Notice the line and (T2.GROUP_CUST != 'SALREP' or T2.GROUP_CUST is null).