*Edited to include a reproducible query.
I've got a list of job numbers that have multiple part numbers per job with different suppliers for each part. I need to return only jobs that contain parts from suppliers A and B, but I also need to return the rest of the parts in the jobs that were returned. I can do this in Tableau by creating a calculated field with: {FIXED [JOB_NUM]: COUNTD(IF [SUPPLIER]="SUPLR_A" OR [SUPPLIER]="SUPLR_B" THEN [JOB_NUM] END)}
(Titled Calc_1), then create another calculated field with: IF [Calc_1] = 1 THEN [JOB_NUM] END
, and then filter out the NULLs. I'd like to do everything in SQL so I'm not pulling and manipulating data that I don't need.
I was thinking maybe I could emulate Tableau and do a distinct count of each JOB_NUM that contains one of the suppliers, but it only returns data in rows that have those suppliers. Which I suppose could be fine, but I'm having trouble figuring out how to duplicate that data across each row with the same JOB_NUM.
DROP TABLE IF EXISTS #JOB_DATA
DROP TABLE IF EXISTS #SUPPLIER_DATA
CREATE TABLE #JOB_DATA (
JOB_NUM varchar(255),
PART_NUM varchar(255),
QTY varchar(255)
);
CREATE TABLE #SUPPLIER_DATA (
PART_NUM varchar(255),
SUPPLIER_NUM varchar(255),
SUPPLIER_NAME VARCHAR(255)
);
INSERT INTO #JOB_DATA
VALUES
('A-1','PN-004','1'),
('A-1','PN-009','1'),
('A-1','PN-015','1'),
('A-1','PN-005','3'),
('A-1','PN-006','1'),
('B-22','PN-004','1'),
('B-22','PN-007','2'),
('B-22','PN-009','1'),
('C-333','PN-004','5'),
('C-333','PN-009','1'),
('C-333','PN-010','1');
INSERT INTO #SUPPLIER_DATA
VALUES
('PN-001','A13582','PARTS_EMPORIUM'),
('PN-002','A13582','PARTS_EMPORIUM'),
('PN-003','V23451','LABEL_KING'),
('PN-004','W69851','PLASTICS_INC'),
('PN-005','A13582','PARTS_EMPORIUM'),
('PN-006','A13582','PARTS_EMPORIUM'),
('PN-007','V23451','LABEL_KING'),
('PN-008','V23451','LABEL_KING'),
('PN-009','W69851','PLASTICS_INC'),
('PN-010','W69851','PLASTICS_INC'),
('PN-011','W69851','PLASTICS_INC'),
('PN-012','L68529','METALWORKS'),
('PN-013','L68529','METALWORKS'),
('PN-014','L68529','METALWORKS'),
('PN-015','V23451','LABEL_KING');
WITH M AS (
SELECT A.JOB_NUM, A.PART_NUM, A.QTY, B.SUPPLIER_NAME
FROM #JOB_DATA AS A
LEFT JOIN (SELECT PART_NUM, SUPPLIER_NAME FROM #SUPPLIER_DATA WHERE SUPPLIER_NAME IN ('PARTS_EMPORIUM','LABEL_KING')) AS B
ON A.PART_NUM = B.PART_NUM
), T AS (
SELECT SUPPLIER_NAME, JOB_NUM, COUNT(DISTINCT JOB_NUM) AS JobCount
FROM M
GROUP BY JOB_NUM, SUPPLIER_NAME
)
SELECT T.JobCount, M.*
FROM M
LEFT JOIN T
ON M.JOB_NUM = T.JOB_NUM
AND M.SUPPLIER_NAME = T.SUPPLIER_NAME
This is what the returned data currently looks like:
JobCount | JOB_NUM | PART_NUM | QTY | SUPPLIER_NAME |
---|---|---|---|---|
NULL | A-1 | PN-004 | 1 | NULL |
NULL | A-1 | PN-009 | 1 | NULL |
1 | A-1 | PN-015 | 1 | LABEL_KING |
1 | A-1 | PN-005 | 3 | PARTS_EMPORIUM |
1 | A-1 | PN-006 | 1 | PARTS_EMPORIUM |
NULL | B-22 | PN-004 | 1 | NULL |
1 | B-22 | PN-007 | 2 | LABEL_KING |
NULL | B-22 | PN-009 | 1 | NULL |
NULL | C-333 | PN-004 | 5 | NULL |
NULL | C-333 | PN-009 | 1 | NULL |
NULL | C-333 | PN-010 | 1 | NULL |
This is what I need it to look:
JobCount | JOB_NUM | PART_NUM | QTY | SUPPLIER_NAME |
---|---|---|---|---|
1 | A-1 | PN-004 | 1 | NULL |
1 | A-1 | PN-009 | 1 | NULL |
1 | A-1 | PN-015 | 1 | LABEL_KING |
1 | A-1 | PN-005 | 3 | PARTS_EMPORIUM |
1 | A-1 | PN-006 | 1 | PARTS_EMPORIUM |
1 | B-22 | PN-004 | 1 | NULL |
1 | B-22 | PN-007 | 2 | LABEL_KING |
1 | B-22 | PN-009 | 1 | NULL |
NULL | C-333 | PN-004 | 5 | NULL |
NULL | C-333 | PN-009 | 1 | NULL |
NULL | C-333 | PN-010 | 1 | NULL |
At this point I'd just filter out the Nulls from JobCount using WHERE JobCount = 1
.
Any help would be appreciated.
You can identify which jobs involve specific supplier(s) through a subquery, or via a window function e.g:
Use a subquery for supplier involvement, the inner join then filters the final result
SELECT
A.JOB_NUM
, A.PART_NUM
, CASE
WHEN B.SUPPLIER IN ('SUPLR_A', 'SUPLR_B')
THEN B.SUPPLIER
END AS SUPPLIER
FROM TABLE_A AS A
INNER JOIN TABLE_B ON A.PART_NUM = B.PART_NUM
INNER JOIN (
SELECT DISTINCT job_num
FROM TABLE_A
WHERE supplier IN ('SUPLR_A', 'SUPLR_B')
) s ON A.JOB_NUM = s.JOB_NUM;
Use window function for supplier involvement, filter by that result
SELECT
A.JOB_NUM
, A.PART_NUM
, CASE
WHEN B.SUPPLIER IN ('SUPLR_A', 'SUPLR_B')
THEN B.SUPPLIER
END AS SUPPLIER
FROM (
SELECT *
, MAX(CASE
WHEN supplier IN ('SUPLR_A', 'SUPLR_B')
THEN 1
ELSE 0
END) OVER (PARTITION BY JOB_NUM) AS s_indicator
FROM TABLE_A
) AS A
INNER JOIN TABLE_B ON A.PART_NUM = B.PART_NUM
WHERE A.S_INDICATOR = 1
It's possible that the window function approach is better for performance but you would need to test that assertion in your environment.