sqlsql-serverjoinlongitudinal

Microsoft SQL join on longitudinal data


In Microsoft SQL Server, I have one table (live table) that takes snapshot of student enrollment information on a daily basis such that one students can take multiple rows by date. In this table, we sore information like student_id, enrollmentStatus (1 or 0), attendnaceDate (date), and live_tablemakrer (all rows = 1). This table is set up to update daily in the data warehouse to bring in new enrollment data of that day.

In another static table (static table), it stores students' enrollment information on the first day of this semester. This table is static an will not change. It contains fields like student_id, enrolledonFD (all rows = 1). In this table, every student is an unique entry.

For each day, I want to compare who are in the live table and also in the static table, who are in the live table but not in the static table, and who are not in the live table but in the static table.

To achieve this goal, I'm thinking of doing a full outer join. But I got stuck on how to handle the date element in the live table. Basically, it should be a full outer join between the static table and the live table for every date.

Could someone give me some advice on how to do that?

Below is sample data and intended outcome

StaticTable

StudentID Grade
A K
B K
C PK

LiveTable

StudentID Date Attendance Grade
A 9/1 1 K
A 9/2 0 K
A 9/3 1 K
B 9/1 1 K
B 9/2 1 K
C 9/1 1 PK
C 9/3 0 PK
D 9/1 1 PK
D 9/3 1 PK
F 9/3 1 K

Intended Outcome

StudentID Date InStaticTable InLiveTable Attendance Grade
A 9/1 1 1 1 K
A 9/2 1 1 0 K
A 9/3 1 1 1 K
B 9/1 1 1 1 K
B 9/2 1 1 1 K
B 9/3 1 0 NaN K
C 9/1 1 1 1 PK
C 9/2 1 0 NaN PK
C 9/3 1 1 0 PK
D 9/1 0 1 1 PK
D 9/3 0 1 1 PK
F 9/3 0 1 1 K

In other words:

  1. For every day, check if a student in the static table is also in the live table
  2. For every day, check if a student in the live table is also in the static table
  3. If a student is included in the static table, in the final table, it should be there for every recorded date
  4. If a student is not included in the static table, in the final table, it should be there when it is recorded in the live table (D does not have 9/2 entry, F does not have 9/1 and 9/2 entry).

Solution

  • You plan to use full join. However, the full join operation is DBMS-specific. Since the DBMS was not initially specified in the question, I suggest considering an example of a query without using full join.

    1. First, use left join (all students)*(all dates) on always true - in fact cross join.
    2. Join StaticTable and LiveTable
    3. Filter out rows, where Grade is null in booth st and lt tables - this row (StudentId,date) not exists in LiveTable with this date and not exists in StaticTable (your 4. condition) See example
    select s.StudentId,d.date 
      ,case when st.Grade is not null then 1 else 0 end isStaticTable
      ,case when lt.Grade is not null then 1 else 0 end isLiveTable
      ,lt.Attendance
      ,coalesce(st.Grade,lt.Grade)  Grade
    from (  -- students
      select StudentId from StaticTable
      union
      select StudentId from LiveTable
    ) s 
    left join ( -- dates
      select distinct date from LiveTable
    )d on 1=1  
    left join StaticTable st on st.StudentId=s.StudentId
    left join LiveTable lt on lt.StudentId=s.StudentId and lt.date=d.date
    where coalesce(st.Grade,lt.Grade) is not null
    order by s.StudentId,d.date;
    
    StudentId date isStaticTable isLiveTable Attendance Grade
    A 2024-09-01 1 1 1 K
    A 2024-09-02 1 1 0 K
    A 2024-09-03 1 1 1 K
    B 2024-09-01 1 1 1 K
    B 2024-09-02 1 1 1 K
    B 2024-09-03 1 0 null K
    C 2024-09-01 1 1 1 PK
    C 2024-09-02 1 0 null PK
    C 2024-09-03 1 1 0 PK
    D 2024-09-01 0 1 1 PK
    D 2024-09-03 0 1 1 PK
    F 2024-09-03 0 1 1 K

    fiddle