sql-server t-sql join subquery redaction

Update the min values of subsets of data in TSQL

I have a redaction routine that is in need of help. I have redacted values less than 10 into a single row called "details redacted". However there are cases where even the details redacted column is still less than 10 and I must go back and get the next min number to add to the redaction until it is greater than 10. The next step is what is stumping me. After I have grabbed the next min value for redaction, I need to remove the rows that contributed to this. The problem is that I don't have the natural key of the data set at this point.

Was thinking about using a cursor on each subset, but would like to avoid this if I can.

I have 2 tables I am working with.

    #WorkingTable ( DashboardYear varchar()
                    , Institutition varchar()
                    , StudentLevel varchar()
                    , Field varchar()
                    , FieldDescription varchar()
                    , CountOfStudents varchar()
                   )
    #RedactedValues( DashboardYear Varchar()
                     , Institution Varchar()
                     , StudetnLevel Varchar()
                     , Field Varchar()
                     , FieldDescription Varchar()
                     , CountOfStudents Varchar()
                    )
    Insert INTO #RedactedValues
           SELECT DashboardYear
                  , Institution
                  , StudetnLevel
                  , Field
                  , FieldDescription
                  , CountOfStudents
             From #WorkingTable
           WHERE (CAST(CountOfStudents AS INT) < 10 and CAST(CountOfStudents AS INT) > 0)
             and Field = 'XXXX'
     -- Find Next Lowest Value for each group and add to redacted total
     UPDATE #RedactedValues
        SET CountOfStudents = CAST(r.CountOfStudents AS INT) + CAST(nextValue.nextValueToRedact AS INT)
     FROM
         (SELECT  DashboardYear
                  , Institution
                  , StudentLevel
                  , MIN(CAST(CountOfStudents AS INT)) AS nextValueToRedact
            FROM #WorkingTable t
           WHERE CAST(t.CountOfStudents AS INT) > 0
             and t.Field <> 'XXXX'
             and t.fieldDescription not like '%unknown'
          GROUP BY DashboardYear, Institution, StudentLevel
          ) nextValue
       JOIN #RedactedValues r
         on r.DashboardYear = nextValue.DashboardYear
        and r.Institution = nextValue.Institution
        and r.StudentLevel = nextValue.StudentLevel

The Above works great for finding the next lowest integer value for each group of data and adds it correctly to my detailsRedacted row. But I cannot figure out how to get back to the row that was the min value and set it to 0.

I cannot add images yet due to reputation limitations... so I cannot show you what the data looks like. Also because it is sensitive in nature I cannot post the data anywhere.

Looking at the above set of data, the details redacted column is only 5. I can get to the value within this group of 13 and add it to details redacted. But because I only have DashboardYear, Institution, and StudentLevel as the key I cannot get to the Field which is needed to define the single row that contributed to redaction.

I wanted to just do an update countofstudent = 0 select min(countofStudents) group by clause and done.. but this isn't cutting it.

Hoping there is a ninja trick I just have not seen yet that could help me. Sorry I am not posting any code, but not sure it would help..

Solution

Using ROW_NUMBER() WITH PARTITION and a GROUP BY

So after trolling stack overflow, I realized I was simply asking the wrong question. I was able to fix my issue by using the following code.

    ---------------------------------------------------------------
    --- Zero added redacted row
    ---------------------------------------------------------------
    WITH e AS 
    (
    SELECT w.DashboardYear
    , w.Institution
    , w.StudentLevel
    , w.Field
    , w.CountOfStudents
    , ROW_NUMBER() OVER
            (PARTITION BY w.DashboardYear
                            , w.Institution
                            , w.StudentLevel
        ORDER BY CAST(w.CountOfStudents AS INT)) ROW

    FROM #WorkingTable w
    JOIN #RedactedValues r
      ON w.dashboardYear = r.DashboardYear
     AND w.Institution = r.Institution
     AND w.studentLevel = r.studentlevel
    WHERE w.field <> @RedactedFieldValue        
      AND w.countofstudents > 0
      AND w.fielddescription NOT LIKE 'Unknown'
    )



    UPDATE #WorkingTable
    SET CountOfStudents = '0'
    FROM e
    JOIN #WorkingTable w
      ON e.dashboardYear = w.dashboardyear
     AND e.institution = w.institution
     AND e.studentlevel = w.studentlevel
     AND e.field = w.field
    WHERE row = 1

By adding a row number partition and ordering by count I was able to infer that the next lowest value for a given group of data is always going to be row # 1.