sqlsql-serverlag

Nested LAG function (kind of)


I have this table

MachineCode DoCreation MachineStateCode
DM139 2024-04-03 00:32:17.377 99
DM139 2024-04-03 00:32:49.080 0
DM139 2024-04-03 01:51:14.427 99
DM139 2024-04-03 01:51:45.643 0
DM139 2024-04-03 06:07:38.357 3
DM139 2024-04-03 06:07:39.043 0
DM139 2024-04-03 06:23:37.433 100
DM139 2024-04-03 06:23:43.697 0
DM139 2024-04-03 06:24:46.153 1
DM139 2024-04-03 06:25:02.467 0
DM139 2024-04-03 06:46:50.410 100
DM139 2024-04-03 06:47:29.247 0
DM139 2024-04-03 06:47:46.397 100
DM139 2024-04-03 06:53:47.160 0
DM139 2024-04-03 06:59:41.633 1
DM139 2024-04-03 06:59:54.550 0
DM139 2024-04-03 07:00:03.203 1
DM139 2024-04-03 07:00:06.437 0
DM139 2024-04-03 07:00:14.247 1
DM139 2024-04-03 07:00:17.683 2
DM139 2024-04-03 07:00:18.153 1
DM139 2024-04-03 07:00:18.840 2
DM139 2024-04-03 07:00:26.860 1
DM139 2024-04-03 07:00:46.387 2

each records rapresent a "change of state" of the specific machine DM139 where

I need to create a new column, let's call it MachineStateCodeAdjusted where its value is equal to 3 if the previous MachineStateCode is 3 and the current MachineStateCode is not 2, otherwise it should have the current MachineStateCode value. (the next table is what I need)

MachineCode DoCreation MachineStateCode MachineStateCodeAdjusted
DM139 2024-04-03 00:32:17.377 99 99
DM139 2024-04-03 00:32:49.080 0 0
DM139 2024-04-03 01:51:14.427 99 99
DM139 2024-04-03 01:51:45.643 0 0
DM139 2024-04-03 06:07:38.357 3 3
DM139 2024-04-03 06:07:39.043 0 3
DM139 2024-04-03 06:23:37.433 100 3
DM139 2024-04-03 06:23:43.697 0 3
DM139 2024-04-03 06:24:46.153 1 3
DM139 2024-04-03 06:25:02.467 0 3
DM139 2024-04-03 06:46:50.410 100 3
DM139 2024-04-03 06:47:29.247 0 3
DM139 2024-04-03 06:47:46.397 100 3
DM139 2024-04-03 06:53:47.160 0 3
DM139 2024-04-03 06:59:41.633 1 3
DM139 2024-04-03 06:59:54.550 0 3
DM139 2024-04-03 07:00:03.203 1 3
DM139 2024-04-03 07:00:06.437 0 3
DM139 2024-04-03 07:00:14.247 1 3
DM139 2024-04-03 07:00:17.683 2 2
DM139 2024-04-03 07:00:18.153 1 1
DM139 2024-04-03 07:00:18.840 2 2
DM139 2024-04-03 07:00:26.860 1 1
DM139 2024-04-03 07:00:46.387 2 2

I used the LAG function to get the previous MachineStateCode (and it works) but to work as I want I need to get the previous MachineStateCodeAdjusted, that is not accessible for the LAG function.

The SQL query I'm executing is this:

   SELECT MachineCode
       , DoCreation
   , MachineStateCode
   , IIF(LAG(MachineStateCode, 1, MachineStateCode) OVER (ORDER BY DoCreation ASC) = 3 
  AND MachineStateCode<>2, 3, MachineStateCode)  AS MachineStateCodeAdjusted       
  FROM mch.MachineStateChanges

and the result I get clearly show me the it not works as I want

MachineCode DoCreation MachineStateCode MachineStateCodeAdjusted
DM139 2024-04-03 00:32:17.377 99 99
DM139 2024-04-03 00:32:49.080 0 0
DM139 2024-04-03 01:51:14.427 99 99
DM139 2024-04-03 01:51:45.643 0 0
DM139 2024-04-03 06:07:38.357 3 3
DM139 2024-04-03 06:07:39.043 0 3
DM139 2024-04-03 06:23:37.433 100 100 WRONG
DM139 2024-04-03 06:23:43.697 0 0 WRONG
DM139 2024-04-03 06:24:46.153 1 1 WRONG

I also tried with a recursive CTE and it works but it is very slow! The query extract in 4 minutes 2 days of work, while I need to extract 1 year!

What is the best (and fast) approach to fix it?


Solution

  • I think your explanation is incomplete. You should have indicated that once you get an "error" (3) then all states do not matter, until you get a "working" (2) again.

    In this case, this is a Gaps & Islands problem. You can use the traditional solution:

    select z.*,
      case when f = 3 or (
              lag(MachineStateCode) 
                over(partition by MachineCode order by DoCreation) = 3
              and MachineStateCode <> 2
           )
           then 3 
           else MachineStateCode
      end as MachineStateCodeAdjusted
    from (
      select y.*,
        first_value(MachineStateCode) over(partition by g order by DoCreation) as f
      from (
        select x.*, sum(i) over(partition by MachineCode order by DoCreation) as g
        from (
          select t.*, case when MachineStateCode in (2, 3) then 1 else 0 end as i
          from MachineStateChanges t
        ) x
      ) y
    ) z
    

    Result:

     MachineCode  DoCreation                   MachineStateCode  i  g  f   MachineStateCodeAdjusted 
     ------------ ---------------------------- ----------------- -- -- --- ------------------------ 
     DM139        2024-04-03 00:32:17.3770000  99                0  0  99  99                       
     DM139        2024-04-03 00:32:49.0800000  0                 0  0  99  0                        
     DM139        2024-04-03 01:51:14.4270000  99                0  0  99  99                       
     DM139        2024-04-03 01:51:45.6430000  0                 0  0  99  0                        
     DM139        2024-04-03 06:07:38.3570000  3                 1  1  3   3                        
     DM139        2024-04-03 06:07:39.0430000  0                 0  1  3   3                        
     DM139        2024-04-03 06:23:37.4330000  100               0  1  3   3                        
     DM139        2024-04-03 06:23:43.6970000  0                 0  1  3   3                        
     DM139        2024-04-03 06:24:46.1530000  1                 0  1  3   3                        
     DM139        2024-04-03 06:25:02.4670000  0                 0  1  3   3                        
     DM139        2024-04-03 06:46:50.4100000  100               0  1  3   3                        
     DM139        2024-04-03 06:47:29.2470000  0                 0  1  3   3                        
     DM139        2024-04-03 06:47:46.3970000  100               0  1  3   3                        
     DM139        2024-04-03 06:53:47.1600000  0                 0  1  3   3                        
     DM139        2024-04-03 06:59:41.6330000  1                 0  1  3   3                        
     DM139        2024-04-03 06:59:54.5500000  0                 0  1  3   3                        
     DM139        2024-04-03 07:00:03.2030000  1                 0  1  3   3                        
     DM139        2024-04-03 07:00:06.4370000  0                 0  1  3   3                        
     DM139        2024-04-03 07:00:14.2470000  1                 0  1  3   3                        
     DM139        2024-04-03 07:00:17.6830000  2                 1  2  2   2                        
     DM139        2024-04-03 07:00:18.1530000  1                 0  2  2   1                        
     DM139        2024-04-03 07:00:18.8400000  2                 1  3  2   2                        
     DM139        2024-04-03 07:00:26.8600000  1                 0  3  2   1                        
     DM139        2024-04-03 07:00:46.3870000  2                 1  4  2   2                        
    

    See running example at db<>fiddle.