sqlsql-serverjoin

Correct way to join in order to compare 2 tables


I have 2 queries like this:

SELECT * 
FROM ZPAY_VIEW_ACTIVITIES_TO_TABLEAU
WHERE SUBSTRING([CC Receiver], 1, 2) = 'BN'
  AND YEAR(Date) = 2024
  AND pupuk_type = 'CPD-HIK'
  AND SUBSTRING(unique_identity, 3, 1) != '3'
  AND [Employee Code] = '0056263'
  AND Date = '2024-07-12';

and this is the output

Date Mandor Code Employee Code Employee Name BLOCK_LAPANGAN
2024-07-12 0059326 0056263 EMILIANA RANYE Q29
2024-07-12 0059326 0056263 EMILIANA RANYE P39
2024-07-12 0059326 0056263 EMILIANA RANYE S21
2024-07-12 0059326 0056263 EMILIANA RANYE S31

another query:

SELECT * 
FROM [elhm_bpn_2024]
WHERE SUBSTRING([Afdeling SKU], 3, 1) != '3'
  AND [Employee Code] = '0056263'
  AND Tanggal = '2024-07-12';

with the output:

Tanggal Employee Code Blok
2024-07-12 0056263 P39
2024-07-12 0056263 Q29
2024-07-12 0056263 S22
2024-07-12 0056263 S30

The objective is I want to compare both tables and find out whether BLOCK_LAPANGAN = Blok on the same employee code and date.

As an internal audit team, I came up with 2 conditions:

'Normal' if BLOCK_LAPANGAN = Blok on the same date and employee code 'Anomali' with note 'Blok' if Blok != BLOCK_LAPANGAN on the same date and employee code,

As you can see based on the example employee code, it should be both normal because the block = BLOCK_LAPANGAN

Based on the example, I come up with the query:

;WITH CTE AS 
(
    SELECT * 
    FROM ZPAY_VIEW_ACTIVITIES_TO_TABLEAU
    WHERE SUBSTRING([CC Receiver], 1, 2) = 'BN'
      AND YEAR(Date) = 2024
      AND pupuk_type = 'CPD-HIK'
      AND SUBSTRING(unique_identity, 3, 1) != '3'
),
CTE2 AS 
(
    SELECT * 
    FROM [elhm_bpn_2024]
    WHERE SUBSTRING([Afdeling SKU], 3, 1) != '3'
)
SELECT 
    a.[Employee Code] AS SKU_SAP,
    a.BLOCK_LAPANGAN AS block_posting_sap,
    b.Blok AS block_posting_elhm,
    CASE 
        WHEN a.[Employee Name] = b.[Employee Name]
             AND (
                (CASE WHEN a.BLOCK_LAPANGAN < b.Blok THEN a.BLOCK_LAPANGAN ELSE b.Blok END) = 
                (CASE WHEN a.BLOCK_LAPANGAN > b.Blok THEN a.BLOCK_LAPANGAN ELSE b.Blok END)
             )
             AND a.[Activity Name] = b.[Jenis Pekerjaan]
        THEN 'Normal' 
        ELSE 'Anomali' 
    END AS status_anomaly,
    CASE 
        WHEN a.[Employee Name] != b.[Employee Name]
             AND (CASE WHEN a.BLOCK_LAPANGAN < b.Blok THEN a.BLOCK_LAPANGAN ELSE b.Blok END) = 
                 (CASE WHEN a.BLOCK_LAPANGAN > b.Blok THEN a.BLOCK_LAPANGAN ELSE b.Blok END)
             AND a.[Activity Name] = b.[Jenis Pekerjaan] 
        THEN 'SKU'

        WHEN a.[Employee Name] = b.[Employee Name]
             AND (CASE WHEN a.BLOCK_LAPANGAN < b.Blok THEN a.BLOCK_LAPANGAN ELSE b.Blok END) != 
                 (CASE WHEN a.BLOCK_LAPANGAN > b.Blok THEN a.BLOCK_LAPANGAN ELSE b.Blok END)
             AND a.[Activity Name] = b.[Jenis Pekerjaan] 
        THEN 'Blok'
    ELSE 'Normal' 
    END AS Note
FROM 
    CTE a
JOIN
    CTE2 b ON a.[Employee Code] = b.[Employee Code]
           AND a.Date = b.Tanggal
WHERE 
    a.[Employee Code] = '0056263'
    AND a.Date = '2024-07-12'
GROUP BY 
    a.[Employee Code], a.[Employee Name], a.Date,                     
    b.[Employee Name], a.BLOCK_LAPANGAN, b.Blok,                     
    a.Estate_Name, a.[Activity Name],
    b.[Jenis Pekerjaan];

But why has the result become like this:

Date Employee Code block_posting_sap block_posting_elhm status_anomaly note
2024-07-12 0056263 P39 P39 Normal Normal
2024-07-12 0056263 Q29 Q29 Normal Normal
2024-07-12 0056263 Q29 P29 Anomali Blok
2024-07-12 0056263 P29 Q29 Anomali Blok
 ID Date    EmployeeCode    Block   ID  Tanggal EmployeeCode    Blok    status_anomaly
1   2024-07-12  0056263 Q29 11  2024-07-12  0056263 Q29 Normal
1   2024-07-12  0056263 Q29 12  2024-07-12  0056263 P39 Anomali
1   2024-07-12  0056263 Q29 13  2024-07-12  0056263 S22 Anomali
1   2024-07-12  0056263 Q29 14  2024-07-12  0056263 S30 Anomali
2   2024-07-12  0056263 P39 11  2024-07-12  0056263 Q29 Anomali
2   2024-07-12  0056263 P39 12  2024-07-12  0056263 P39 Normal
2   2024-07-12  0056263 P39 13  2024-07-12  0056263 S22 Anomali
2   2024-07-12  0056263 P39 14  2024-07-12  0056263 S30 Anomali
3   2024-07-12  0056263 S21 11  2024-07-12  0056263 Q29 Anomali
3   2024-07-12  0056263 S21 12  2024-07-12  0056263 P39 Anomali
3   2024-07-12  0056263 S21 13  2024-07-12  0056263 S22 Anomali
3   2024-07-12  0056263 S21 14  2024-07-12  0056263 S30 Anomali
4   2024-07-12  0056263 S31 11  2024-07-12  0056263 Q29 Anomali
4   2024-07-12  0056263 S31 12  2024-07-12  0056263 P39 Anomali
4   2024-07-12  0056263 S31 13  2024-07-12  0056263 S22 Anomali
4   2024-07-12  0056263 S31 14  2024-07-12  0056263 S30 Anomali

You can see there's 2 normal and 14 anomaly hence it should be only 2 normal and 2 anomaly because I already make it in inner join.

Expected result :

Date Employee Code block_posting_sap block_posting_elhm status_anomaly note
2024-07-12 0056263 P39 P39 Normal Normal
2024-07-12 0056263 Q29 Q29 Normal Normal
2024-07-12 0056263 S21 S22 Anomali Blok
2024-07-12 0056263 S31 S30 Anomali Blok

OR

Date Employee Code block_posting_sap block_posting_elhm status_anomaly note
2024-07-12 0056263 P39 P39 Normal Normal
2024-07-12 0056263 Q29 Q29 Normal Normal
2024-07-12 0056263 S31 S22 Anomali Blok
2024-07-12 0056263 S21 S30 Anomali Blok

the main objective is to tell combination that met, and not met (Anomaly)

What is wrong with my query?


Solution

  • Like all things SQL, you need to have clear logical requirements before you attempt a query. SQL can only implement logic you specify. So if you can't come up with a logical way to solve the problem, you certainly can't write a query that magically solves it.

    That said, based on everything you've said, here is a possible solution... its terrible, and fragile, but it might just work.

    Its horribly convoluted, but maybe it does what you need.

    with cte11 as (
      select *
        -- If we can find a matching row in the other table, then its 'normal'
        , case when exists (
          select 1
          from CTE2 c2
          where c2.Tanggal = c1.[Date]
          and c2.EmployeeCode = c1.EmployeeCode
          and c2.Blok = c1.Block) then 'Normal' else 'Abnormal' end status_anomaly
      from CTE1 c1
    ), cte22 as (
      -- Allocate row numbers to all the 'abnormal' rows
      select *
        , row_number() over (partition by EmployeeCode, [Date], status_anomaly order by Block) rn1
      from cte11
    )
    select *
      -- For all 'abnormal' rows attempt to pull an unused Blok from
      -- the other table, using the row number to ensure we only
      -- use each Blok code once.
      , case when status_anomaly = 'Normal' then Block else (
          select Blok
          from (
            select c2.Blok
              , row_number() over (partition by EmployeeCode, [Date] order by Block) rn2
            from CTE2 c2
            where c2.Tanggal = c1.[Date]
            and c2.EmployeeCode = c1.EmployeeCode
            and c2.Blok not in (select c3.Block from cte22 c3 where c3.status_anomaly = 'Normal')
        ) x
        where x.rn2 = c1.rn1
      ) end
    from cte22 c1;
    

    Returns

    ID Date EmployeeCode Block status_anomaly rn1 (No column name)
    3 2024-07-12 0056263 S21 Abnormal 1 S22
    4 2024-07-12 0056263 S23 Abnormal 2 S24
    2 2024-07-12 0056263 P39 Normal 1 P39
    1 2024-07-12 0056263 Q29 Normal 2 Q29

    db<>fiddle