sql-servert-sqljoinsubqueryredaction

Update the min values of subsets of data in TSQL


I have a redaction routine that is in need of help. I have redacted values less than 10 into a single row called "details redacted". However there are cases where even the details redacted column is still less than 10 and I must go back and get the next min number to add to the redaction until it is greater than 10. The next step is what is stumping me. After I have grabbed the next min value for redaction, I need to remove the rows that contributed to this. The problem is that I don't have the natural key of the data set at this point.

Was thinking about using a cursor on each subset, but would like to avoid this if I can.

I have 2 tables I am working with.

    #WorkingTable ( DashboardYear varchar()
                    , Institutition varchar()
                    , StudentLevel varchar()
                    , Field varchar()
                    , FieldDescription varchar()
                    , CountOfStudents varchar()
                   )
    #RedactedValues( DashboardYear Varchar()
                     , Institution Varchar()
                     , StudetnLevel Varchar()
                     , Field Varchar()
                     , FieldDescription Varchar()
                     , CountOfStudents Varchar()
                    )
    Insert INTO #RedactedValues
           SELECT DashboardYear
                  , Institution
                  , StudetnLevel
                  , Field
                  , FieldDescription
                  , CountOfStudents
             From #WorkingTable
           WHERE (CAST(CountOfStudents AS INT) < 10 and CAST(CountOfStudents AS INT) > 0)
             and Field = 'XXXX'
     -- Find Next Lowest Value for each group and add to redacted total
     UPDATE #RedactedValues
        SET CountOfStudents = CAST(r.CountOfStudents AS INT) + CAST(nextValue.nextValueToRedact AS INT)
     FROM
         (SELECT  DashboardYear
                  , Institution
                  , StudentLevel
                  , MIN(CAST(CountOfStudents AS INT)) AS nextValueToRedact
            FROM #WorkingTable t
           WHERE CAST(t.CountOfStudents AS INT) > 0
             and t.Field <> 'XXXX'
             and t.fieldDescription not like '%unknown'
          GROUP BY DashboardYear, Institution, StudentLevel
          ) nextValue
       JOIN #RedactedValues r
         on r.DashboardYear = nextValue.DashboardYear
        and r.Institution = nextValue.Institution
        and r.StudentLevel = nextValue.StudentLevel

The Above works great for finding the next lowest integer value for each group of data and adds it correctly to my detailsRedacted row. But I cannot figure out how to get back to the row that was the min value and set it to 0.

I cannot add images yet due to reputation limitations... so I cannot show you what the data looks like. Also because it is sensitive in nature I cannot post the data anywhere.

Looking at the above set of data, the details redacted column is only 5. I can get to the value within this group of 13 and add it to details redacted. But because I only have DashboardYear, Institution, and StudentLevel as the key I cannot get to the Field which is needed to define the single row that contributed to redaction.

I wanted to just do an update countofstudent = 0 select min(countofStudents) group by clause and done.. but this isn't cutting it.

Hoping there is a ninja trick I just have not seen yet that could help me. Sorry I am not posting any code, but not sure it would help..


Solution

  • Using ROW_NUMBER() WITH PARTITION and a GROUP BY

    So after trolling stack overflow, I realized I was simply asking the wrong question. I was able to fix my issue by using the following code.

        ---------------------------------------------------------------
        --- Zero added redacted row
        ---------------------------------------------------------------
        WITH e AS 
        (
        SELECT w.DashboardYear
        , w.Institution
        , w.StudentLevel
        , w.Field
        , w.CountOfStudents
        , ROW_NUMBER() OVER
                (PARTITION BY w.DashboardYear
                                , w.Institution
                                , w.StudentLevel
            ORDER BY CAST(w.CountOfStudents AS INT)) ROW
    
        FROM #WorkingTable w
        JOIN #RedactedValues r
          ON w.dashboardYear = r.DashboardYear
         AND w.Institution = r.Institution
         AND w.studentLevel = r.studentlevel
        WHERE w.field <> @RedactedFieldValue        
          AND w.countofstudents > 0
          AND w.fielddescription NOT LIKE 'Unknown'
        )
    
    
    
        UPDATE #WorkingTable
        SET CountOfStudents = '0'
        FROM e
        JOIN #WorkingTable w
          ON e.dashboardYear = w.dashboardyear
         AND e.institution = w.institution
         AND e.studentlevel = w.studentlevel
         AND e.field = w.field
        WHERE row = 1
    

    By adding a row number partition and ordering by count I was able to infer that the next lowest value for a given group of data is always going to be row # 1.