sqlsql-servertableau-desktop

Only return job numbers that contain certain suppliers but return all parts in that job number?


*Edited to include a reproducible query.

I've got a list of job numbers that have multiple part numbers per job with different suppliers for each part. I need to return only jobs that contain parts from suppliers A and B, but I also need to return the rest of the parts in the jobs that were returned. I can do this in Tableau by creating a calculated field with: {FIXED [JOB_NUM]: COUNTD(IF [SUPPLIER]="SUPLR_A" OR [SUPPLIER]="SUPLR_B" THEN [JOB_NUM] END)} (Titled Calc_1), then create another calculated field with: IF [Calc_1] = 1 THEN [JOB_NUM] END, and then filter out the NULLs. I'd like to do everything in SQL so I'm not pulling and manipulating data that I don't need.

I was thinking maybe I could emulate Tableau and do a distinct count of each JOB_NUM that contains one of the suppliers, but it only returns data in rows that have those suppliers. Which I suppose could be fine, but I'm having trouble figuring out how to duplicate that data across each row with the same JOB_NUM.

DROP TABLE IF EXISTS #JOB_DATA
DROP TABLE IF EXISTS #SUPPLIER_DATA

CREATE TABLE #JOB_DATA (
    JOB_NUM varchar(255),
    PART_NUM varchar(255),
    QTY varchar(255)
);

CREATE TABLE #SUPPLIER_DATA (
    PART_NUM varchar(255),
    SUPPLIER_NUM varchar(255),
    SUPPLIER_NAME VARCHAR(255)
);

INSERT INTO #JOB_DATA
VALUES 
    ('A-1','PN-004','1'),
    ('A-1','PN-009','1'),
    ('A-1','PN-015','1'),
    ('A-1','PN-005','3'),
    ('A-1','PN-006','1'),
    ('B-22','PN-004','1'),
    ('B-22','PN-007','2'),
    ('B-22','PN-009','1'),
    ('C-333','PN-004','5'),
    ('C-333','PN-009','1'),
    ('C-333','PN-010','1');

INSERT INTO #SUPPLIER_DATA
VALUES 
    ('PN-001','A13582','PARTS_EMPORIUM'),
    ('PN-002','A13582','PARTS_EMPORIUM'),
    ('PN-003','V23451','LABEL_KING'),
    ('PN-004','W69851','PLASTICS_INC'),
    ('PN-005','A13582','PARTS_EMPORIUM'),
    ('PN-006','A13582','PARTS_EMPORIUM'),
    ('PN-007','V23451','LABEL_KING'),
    ('PN-008','V23451','LABEL_KING'),
    ('PN-009','W69851','PLASTICS_INC'),
    ('PN-010','W69851','PLASTICS_INC'),
    ('PN-011','W69851','PLASTICS_INC'),
    ('PN-012','L68529','METALWORKS'),
    ('PN-013','L68529','METALWORKS'),
    ('PN-014','L68529','METALWORKS'),
    ('PN-015','V23451','LABEL_KING');

WITH M AS (
    SELECT A.JOB_NUM, A.PART_NUM, A.QTY, B.SUPPLIER_NAME
    FROM #JOB_DATA AS A
    LEFT JOIN (SELECT PART_NUM, SUPPLIER_NAME FROM #SUPPLIER_DATA WHERE SUPPLIER_NAME IN ('PARTS_EMPORIUM','LABEL_KING')) AS B
        ON A.PART_NUM = B.PART_NUM
), T AS (
    SELECT SUPPLIER_NAME, JOB_NUM, COUNT(DISTINCT JOB_NUM) AS JobCount
    FROM M
    GROUP BY JOB_NUM, SUPPLIER_NAME
)
SELECT T.JobCount, M.*
FROM M
LEFT JOIN T
    ON M.JOB_NUM = T.JOB_NUM
    AND M.SUPPLIER_NAME = T.SUPPLIER_NAME

This is what the returned data currently looks like:

JobCount JOB_NUM PART_NUM QTY SUPPLIER_NAME
NULL A-1 PN-004 1 NULL
NULL A-1 PN-009 1 NULL
1 A-1 PN-015 1 LABEL_KING
1 A-1 PN-005 3 PARTS_EMPORIUM
1 A-1 PN-006 1 PARTS_EMPORIUM
NULL B-22 PN-004 1 NULL
1 B-22 PN-007 2 LABEL_KING
NULL B-22 PN-009 1 NULL
NULL C-333 PN-004 5 NULL
NULL C-333 PN-009 1 NULL
NULL C-333 PN-010 1 NULL

This is what I need it to look:

JobCount JOB_NUM PART_NUM QTY SUPPLIER_NAME
1 A-1 PN-004 1 NULL
1 A-1 PN-009 1 NULL
1 A-1 PN-015 1 LABEL_KING
1 A-1 PN-005 3 PARTS_EMPORIUM
1 A-1 PN-006 1 PARTS_EMPORIUM
1 B-22 PN-004 1 NULL
1 B-22 PN-007 2 LABEL_KING
1 B-22 PN-009 1 NULL
NULL C-333 PN-004 5 NULL
NULL C-333 PN-009 1 NULL
NULL C-333 PN-010 1 NULL

At this point I'd just filter out the Nulls from JobCount using WHERE JobCount = 1.

Any help would be appreciated.


Solution

  • You can identify which jobs involve specific supplier(s) through a subquery, or via a window function e.g:

    Use a subquery for supplier involvement, the inner join then filters the final result

    SELECT
          A.JOB_NUM
        , A.PART_NUM
        , CASE 
            WHEN B.SUPPLIER IN ('SUPLR_A', 'SUPLR_B')
                THEN B.SUPPLIER
            END AS SUPPLIER
    FROM TABLE_A AS A
    INNER JOIN TABLE_B ON A.PART_NUM = B.PART_NUM
    INNER JOIN (
        SELECT DISTINCT job_num
        FROM TABLE_A
        WHERE supplier IN ('SUPLR_A', 'SUPLR_B')
        ) s ON A.JOB_NUM = s.JOB_NUM;
    

    Use window function for supplier involvement, filter by that result

    SELECT
          A.JOB_NUM
        , A.PART_NUM
        , CASE 
            WHEN B.SUPPLIER IN ('SUPLR_A', 'SUPLR_B')
                THEN B.SUPPLIER
            END AS SUPPLIER
    FROM (
        SELECT *
            , MAX(CASE 
                    WHEN supplier IN ('SUPLR_A', 'SUPLR_B')
                        THEN 1
                    ELSE 0
                    END) OVER (PARTITION BY JOB_NUM) AS s_indicator
        FROM TABLE_A
        ) AS A
    INNER JOIN TABLE_B ON A.PART_NUM = B.PART_NUM
    WHERE A.S_INDICATOR = 1
    

    It's possible that the window function approach is better for performance but you would need to test that assertion in your environment.