The data I am working with
Consider the following 2 database tables:
CREATE TABLE [dbo].[Contact](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Contact_UID] [uniqueidentifier] NOT NULL CONSTRAINT [DF_Contact_Contact_UID] DEFAULT (newsequentialid()),
[Name] [nvarchar](255) NOT NULL,
[ContactStatus] [nvarchar](255) NOT NULL)
CREATE TABLE [dbo].[Contact_Log](
[ID] [int] IDENTITY(1,1) NOT NULL,
[LogDate] [datetimeoffset](7) NOT NULL CONSTRAINT [DF_Contact_Log_LogDate] DEFAULT (sysdatetimeoffset()),
[Contact_UID] [uniqueidentifier] NOT NULL CONSTRAINT [DF_Contact_Log_Contact_UID] DEFAULT (newsequentialid()),
[Name] [nvarchar](255) NOT NULL,
[ContactStatus] [nvarchar](255) NOT NULL)
The Contact table is the main table for contact records. It stores the name of the contact and the status (e.g. "Alive", "Dead" or whatever).
The Contact_Log table stores all changes made to the Contact table.
So here is some sample data:
Contact:
+----+--------------------------------------+------+---------------+
| ID | Contact_UID | Name | ContactStatus |
+----+--------------------------------------+------+---------------+
| 1 | 62918AC1-1C6C-4DEB-B7F8-5D5EF913F667 | John | Dead |
+----+--------------------------------------+------+---------------+
| 2 | F7844037-2FF5-47B9-874D-C0920E7DC092 | Jane | Alive |
+----+--------------------------------------+------+---------------+
Contact_Log:
+----+--------------------------------------+------+---------------+------------+
| ID | Contact_UID | Name | ContactStatus | LogDate |
+----+--------------------------------------+------+---------------+------------+
| 1 | 62918AC1-1C6C-4DEB-B7F8-5D5EF913F667 | John | Alive | 2019-01-01 |
+----+--------------------------------------+------+---------------+------------+
| 2 | 62918AC1-1C6C-4DEB-B7F8-5D5EF913F667 | John | Dead | 2019-01-02 |
+----+--------------------------------------+------+---------------+------------+
| 3 | 62918AC1-1C6C-4DEB-B7F8-5D5EF913F667 | John | Dead | 2019-01-03 |
+----+--------------------------------------+------+---------------+------------+
| 4 | F7844037-2FF5-47B9-874D-C0920E7DC092 | Jane | Alive | 2019-01-04 |
+----+--------------------------------------+------+---------------+------------+
NOTE: At this point I haven't added any indexes or anything like that on that tables.
Test Scenario
The above is just some sample data. The data I am testing on has the following row counts:
Contact: ~10,000 rows
Contact_Log: ~3,000,000 rows
I am currently using SQL Server 2008 R2 for testing. So a solution that is supported in that and later is preferred.
What I am trying to achieve
Basically I am trying to formulate a query that can tell me the LogDate
for when the ContactStatus
field was last changed, for a specific Contact_UID
, taken from the Contact_Log
table.
For example, if the record I was interested in was "John", then the result should be "2019-01-02". As this is the date that John's ContactStatus last changed (i.e. it changed from "Alive" to "Dead").
Ultimately, I want to put this query into a function. A function that can be called by passing in the Contact_UID and the name of the field I want to check. This function could then be called as part of a more general query. For example:
SELECT Name, MyFunction('62918AC1-1C6C-4DEB-B7F8-5D5EF913F667', 'ContactStatus') AS StatusLastChanged FROM Contact
What I have tried so far
Well, I have tried a few things, and although I can get the results I want. My attempts are really struggling with performance problems.
NOTE: Although I only really want a single datetimeoffset result. Some attempts include more data/fields simply to try and validate the data is accurate.
Attempt 1:
SELECT TOP(1) a.LogDate
FROM Contact_Log AS a
WHERE a.Contact_UID = '62918AC1-1C6C-4DEB-B7F8-5D5EF913F667'
AND a.ContactStatus <>
(
SELECT TOP(1) b.ContactStatus
FROM Contact_Log AS b
WHERE b.Contact_UID = '62918AC1-1C6C-4DEB-B7F8-5D5EF913F667'
AND a.LogDate > b.LogDate
ORDER BY b.LogDate DESC
)
ORDER BY LogDate DESC
Problem 1: Too slow. I had to stop the query after nearly an hour of waiting with no results.
Attempt 2:
SELECT A.LogDate
FROM (SELECT ROW_NUMBER() OVER (ORDER BY LogDate DESC, ID DESC) AS rnum, ID, LogDate, Contact_UID, ContactStatus FROM Contact_Log) A
LEFT JOIN (SELECT ROW_NUMBER() OVER (ORDER BY LogDate DESC, ID DESC) AS rnum, ID, LogDate, Contact_UID, ContactStatus FROM Contact_Log) B
ON A.rnum = B.rnum-1
WHERE
(B.rnum IS NULL
OR (A.Contact_UID = '62918AC1-1C6C-4DEB-B7F8-5D5EF913F667'
AND B.Contact_UID = '62918AC1-1C6C-4DEB-B7F8-5D5EF913F667'
AND A.ContactStatus != B.ContactStatus))
ORDER BY A.rnum
Problem 2: This works and gives me the correct data set. However it takes 6 seconds which is too slow. Remember it will need to work as a function in a more general query (with ~10,000 rows).
Attempt 3: Now this is basically the same as attempt 2, expect I tried to apply TOP(1)
so that I can get the result that I actually want.
SELECT TOP(1) A.LogDate
FROM (SELECT ROW_NUMBER() OVER (ORDER BY LogDate DESC, ID DESC) AS rnum, ID, LogDate, Contact_UID, ContactStatus FROM Contact_Log) A
LEFT JOIN (SELECT ROW_NUMBER() OVER (ORDER BY LogDate DESC, ID DESC) AS rnum, ID, LogDate, Contact_UID, ContactStatus FROM Contact_Log) B
ON A.rnum = B.rnum-1
WHERE
(B.rnum IS NULL
OR (A.Contact_UID = '62918AC1-1C6C-4DEB-B7F8-5D5EF913F667'
AND B.Contact_UID = '62918AC1-1C6C-4DEB-B7F8-5D5EF913F667'
AND A.ContactStatus != B.ContactStatus))
ORDER BY A.rnum
Problem 3: To my surprise, this takes a lot longer than attempt 2, even though all I did was add TOP(1)
at the start. This took over 5 minutes so I stopped the query and gave up.
Question
How can I do what I want in "What I am trying to achieve", but with a reasonable amount of performance? (I would be happy getting it under 1 second at this stage).
Remember, I just want a single datetimeoffset as the result so it can be used in a function.
So far I have no specific indexes created. I am happy to consider suggestions for those as a suitable answer, if no improvement to the query are possible. Or any changes to the schema that is appropriate.
Bottom Line
I am looking for a query that will produce 1 result, with 1 datetimeoffset field. It needs to take less than 1 second to run.
You want to select the smallest date right after the highest date that does not equal the current ContactStatus. That would be something like this:
select
min(LogDate)
from Contact_Log
where
Contact_UID='62918AC1-1C6C-4DEB-B7F8-5D5EF913F667'
and ContactStatus = (
select top 1
ContactStatus
from Contact_Log where
Contact_UID='62918AC1-1C6C-4DEB-B7F8-5D5EF913F667'
order by Log_Date desc
)
and LogDate > (
select max(LogDate)
from Contact_Log
where Contact_UID='62918AC1-1C6C-4DEB-B7F8-5D5EF913F667'
and ContactStatus != (
select top 1
ContactStatus
from Contact_Log where
Contact_UID='62918AC1-1C6C-4DEB-B7F8-5D5EF913F667'
order by Log_Date desc
)
);