sqlt-sqldatabricks-sql

SQL Joins returning results from one T-SQL Query but not another


I have coded two T-SQL queries which are executed on Databricks using Databricks SQL.

The first query returns results of 144 rows:

144 Rows Returned

    SELECT DISTINCT
      PSTT.TransactionTypeID
    FROM basedrd.TransactionType PSTT
    LEFT OUTER JOIN basedrd.PMSTransaction ST
      ON PSTT.TransactionType = ST.TransactionType
    WHERE PSTT.TransactionSource = 'Pershing'
    OR (PSTT.TransactionSource = '')
    AND (ST.SEDOL = 'CASH'
          AND PSTT.IsCashTransaction = 1
          OR ST.SEDOL <> 'CASH'
          AND PSTT.IsCashTransaction = 0)

The second query doesn't return any rows

No Rows returned

SELECT DISTINCT
  PSTT.TransactionTypeID
FROM basedrd.PMSTransaction ST
LEFT OUTER JOIN basedrd.TransactionType PSTT
  ON PSTT.TransactionType = ST.TransactionType
    AND (PSTT.TransactionSource = 'Pershing'
      OR PSTT.TransactionSource = '')
    AND (ST.SEDOL = 'CASH'
      AND PSTT.IsCashTransaction = 1
      OR ST.SEDOL <> 'CASH'
      AND PSTT.IsCashTransaction = 0)

The stranges thing is that if I execute the same code on our SQL Server I get 145 rows returned onf the first query

145 rows returned

SELECT DISTINCT
  PSTT.TransactionTypeID
FROM dbo.TransactionType PSTT
LEFT OUTER JOIN dbo.PMSTransaction ST
  ON PSTT.TransactionType = ST.TransactionType
WHERE PSTT.TransactionSource = 'Pershing'
OR (PSTT.TransactionSource = '')
AND (ST.SEDOL = 'CASH'
      AND PSTT.IsCashTransaction = 1
      OR ST.SEDOL <> 'CASH'
      AND PSTT.IsCashTransaction = 0)

And whereas I wasn't getting any rows returned with the second query on Databricks above on our SQL Server I get 46 rows returned.

SELECT DISTINCT
  PSTT.TransactionTypeID
FROM dbo.PMSTransaction ST
LEFT OUTER JOIN dbo.TransactionType PSTT
  ON PSTT.TransactionType = ST.TransactionType
    AND (PSTT.TransactionSource = 'Pershing'
      OR PSTT.TransactionSource = '')
    AND (ST.SEDOL = 'CASH'
      AND PSTT.IsCashTransaction = 1
      OR ST.SEDOL <> 'CASH'
      AND PSTT.IsCashTransaction = 0)

The only difference between the query executed on Databricks SQL and SQL Server is the database name(schema) SQL Server is dbo, whereas databricks is basedrd.


Solution

  • It appears that your data doesn't match up. A query like should help confirm that the values don't line up as expected:

    with data as (
        select
            case when PSTT.TransactionSource = 'Pershing' then 'Pershing'
                 when PSTT.TransactionSource = ''         then 'Blank'
                 when ST.TransactionType is not null      then
                     case when PSTT.TransactionType is not null then 'Other' else 'Null' end
                 else 'Unmatched Row' end as TransactionSource,
            case when ST.SEDOL = 'CASH' then 'Cash'
                 when ST.SEDOL is null  then 'Null'
                 else 'Other' end as SEDOL,
            case when PSTT.IsCashTransaction = 0     then '0'
                 when PSTT.IsCashTransaction = 1     then '1'
                 when PSTT.IsCashTransaction is null then 'Null'
                 else 'Other' end as IsCashTransaction
        from <SRC>.PMSTransaction ST left outer join <SRC>.TransactionType PSTT
            on PSTT.TransactionType = ST.TransactionType
    )
    select TransactionSource, SEDOL, IsCashTransaction, count(*)
    from data
    group by TransactionSource, SEDOL, IsCashTransaction
    order by TransactionSource, SEDOL, IsCashTransaction;