I've got tables with millions (possibly billions) of rows, so I really need to be efficient with the query.
In this query, I'm joining multiple tables. The segment in question is:
LEFT JOIN
dbo.GCSOCTPS dbo_GCSOCTPS ON (GC_TBMED.MED_CLASS_NUM = dbo_GCSOCTPS.CLASS_NUM)
AND (GC_TBMED.MED_SOC_NUM = dbo_GCSOCTPS.SOC_NUM)
AND (GC_TBMED.MED_EFF_DATE = dbo_GCSOCTPS.EFF_DATE)
AND (GC_TBMED.MED_CANC_DATE = dbo_GCSOCTPS.CANC_DATE))
GC_TBMED
has dates in DATE
format, dbo_GCSOCTPS
has dates in DATETIME
format. Sadly, due to the way our company uses data, I can't change that.
What would be the most efficient way to compare these columns? CAST
? CONVERT
? I've seen people convert to Text format and compare that way. I'm hoping someone could guide me based on their experience with very large datasets.
If I run this block of code, I get a value for HDHPQ:
SELECT TOP 200
HDHPQ,
SOC_NUM,
EFF_DATE,
CLASS_NUM,
CANC_DATE
FROM
dbo.GCSOCTPS
WHERE
SOC_NUM = '25521'
AND CLASS_NUM = '37'
AND CANC_DATE IS NULL;
This is returned:
HDHPQ SOC_NUM EFF_DATE CLASS_NUM CANC_DATE
N 25521 2025-01-01 00:00:00.000 37 NULL
If I run this block of code, I get data returned as well:
SELECT TOP 200
MED_SOC_NUM,
MED_EFF_DATE,
MED_CLASS_NUM,
MED_CANC_DATE
FROM
[dbo].[AS_tblTBMED] GC_TBMED
WHERE
GC_TBMED.MED_SOC_NUM = '25521'
AND GC_TBMED.MED_CLASS_NUM = '37'
AND GC_TBMED.MED_CANC_DATE IS NULL;
This gets returned:
MED_SOC_NUM MED_EFF_DATE MED_CLASS_NUM MED_CANC_DATE
25521 2025-01-01 37 NULL
Each of those returns one row. I need to join them so that I get all of the data from the second query and the value of HDHPQ from the first query.
So I run this query:
SELECT DISTINCT TOP 200
dbo_GCSOCTPS.HDHPQ,
dbo_GCSOCTPS.SOC_NUM,
dbo_GCSOCTPS.EFF_DATE,
dbo_GCSOCTPS.CLASS_NUM,
dbo_GCSOCTPS.CANC_DATE,
GC_TBMED.MED_SOC_NUM,
GC_TBMED.MED_EFF_DATE,
GC_TBMED.MED_CLASS_NUM,
GC_TBMED.MED_CANC_DATE
FROM
[dbo].[AS_tblTBMED] GC_TBMED
LEFT JOIN
dbo.GCSOCTPS dbo_GCSOCTPS ON (GC_TBMED.MED_CLASS_NUM = dbo_GCSOCTPS.CLASS_NUM)
AND (GC_TBMED.MED_SOC_NUM = dbo_GCSOCTPS.SOC_NUM)
AND (GC_TBMED.MED_EFF_DATE = CAST(dbo_GCSOCTPS.EFF_DATE as DATE))
AND (GC_TBMED.MED_CANC_DATE = CAST(dbo_GCSOCTPS.CANC_DATE as DATE))
WHERE
GC_TBMED.MED_SOC_NUM = '25521'
AND GC_TBMED.MED_CLASS_NUM = '37'
AND GC_TBMED.MED_CANC_DATE IS NULL
AND dbo_GCSOCTPS.EFF_DATE >= '2025-01-01';
And an empty recordset is returned. If I comment out the two dates in the join, I get data. So, my assumption is that the date fields aren't properly equating, since they are equal and therefore I should get data if they're included in the query.
You have (at least) two problems here:
<anything> = NULL
will never be true so the join will fail.You probably need something along the lines of:
select DISTINCT top 200
gcs.HDHPQ,
gcs.SOC_NUM,
gcs.EFF_DATE,
gcs.CLASS_NUM,
gcs.CANC_DATE,
tbm.MED_SOC_NUM,
tbm.MED_EFF_DATE,
tbm.MED_CLASS_NUM,
tbm.MED_CANC_DATE
from [dbo].[AS_tblTBMED] tbm
LEFT JOIN dbo.GCSOCTPS gcs
ON tbm.MED_CLASS_NUM = gcs.CLASS_NUM
AND tbm.MED_SOC_NUM = gcs.SOC_NUM
AND tbm.MED_EFF_DATE = CAST(gcs.EFF_DATE as DATE)
AND gcs.CANC_DATE IS NULL
where tbm.MED_SOC_NUM = '25521'
and tbm.MED_CLASS_NUM = '37'
and tbm.MED_CANC_DATE is NULL
and tbm.MED_EFF_DATE >= '20250101';
There are a couple of more suspicious lines though:
CAST(gcs.EFF_DATE as DATE)
if EFF_DATE doesn't contain hours etc, you can skip the cast probablytbm.MED_CLASS_NUM = '37'
why do you use '37'
and not just 37
for columns which seems to be a number?