sqlamazon-redshifttableau-apiaginity

How to relate two columns from separate tables, if the values are SIMILAR not EXACT?


I have two columns that I'm trying to join, but am getting a bunch of nulls because instead of it saying "SWEATER" in both columns, it says "SWEATER" in one and "SWEAT" in another. I did a FULL OUTER JOIN, and although a lot have matched, there are still a bunch of nulls because of the above issue.

Using Tableau to do the FULL OUTER JOIN on two queries: one query pulls the "number of items sold" while the other pulls the "number of items returned", matching against the item name/item description (which, in a perfect world, would match exactly. But that's my problem, they don't).

SELECT 
    item_description_1, SUM(quantity_ordered) "units_sold_OCT2019"
FROM
    l_dmw_order_report
WHERE
    quantity_ordered > 0
    AND oms_order_date BETWEEN '2019-10-01' AND '2019-10-31'

GROUP BY item_description_1
    HAVING item_description_1 NOT IN ('Freight')

ORDER BY item_description_1

SELECT
    item_name, SUM(return_qty) "#_of_returns_OCT2019"
FROM 
    l_nvr_ec_returns
WHERE 
    return_created_date BETWEEN '2019-10-01' AND '2019-10-31'
    AND return_status NOT IN ('Cancelled', 'cancelled')

GROUP BY item_name
    HAVING item_name NOT IN ('')
ORDER BY item_name

[Here's an image of where exactly I'm stuck at, where the values don't match up and I don't know how to proceed in order to match them...][1] [1]: https://i.sstatic.net/B61Gr.png


Solution

  • Here is what i can suggest. l_dmw_order_report.item_description_1 gives you the items sold and you want to match this with l_nvr_ec_returns.item_name.

    unfortunately they are not matching.

    Step1: check your filter condition (HAVING statement). don't you think you should also keep NOT IN ('Freight') for table with l_nvr_ec_returns

    Step2: both tables can have different naming description/items and you can match them by writing CASE statement.

    example:

    Item_name 
    --------
    SWEATER
    JACKET
    
    item_description_1
    -------
    SWEAT
    JACK
    

    Then

    SELECT 
        (case when item_description_1='SWEAT' then 'SWEATER'
          when item_description_1='JACK' then 'JACKET'
    END) as item_descrption_1, SUM(quantity_ordered) "units_sold_OCT2019"
    FROM
        l_dmw_order_report
    WHERE
        quantity_ordered > 0
        AND oms_order_date BETWEEN '2019-10-01' AND '2019-10-31'
    
    GROUP BY item_description_1
        HAVING item_description_1 NOT IN ('Freight')
    
    ORDER BY item_description_1
    

    try this.