sassubsetaveragedatastep

Extracting row with highest value in a column while also calculating averages by group


I have been tasked with taking the following data and creating two permanent data sets from it. One of these permanent data sets is supposed to contain the average of the "value" column for each group (meaning there should only be four rows in the end, with a new column that represents the average of respective values for A, B, C, and D). Averages should exclude missing values, meaning that if category A has a missing value, it should be divided by 3, not 4. The second permanent data set needs to be the one row with the highest overall value in the "value" column (in this case, the row with D 09JUL2021 951 should be the only row exported). I am having a tough time extracting that single row for the second data set. If you know of a way to perform these operations simultaneously, please let me know. Thank you for your time!

Example data:

data work.have;
  input type $ date DATE9. value;
datalines;
A 08JUL2021 .
A 09JUL2021 20
A 20JUL2021 55
A 20JUL2021 2
B 02JUL2021 9
B 22JUL2021 6
B 04JUL2021 8
B 07JUL2021 406
C 01JUL2021 215
C 28JUL2021 63
C 30JUL2021 78
C 21JUL2021 80
D 18JUL2021 951
D 09JUL2021 .
D 14JUL2021 54
D 08JUL2021 73
;

Here is what I tried:

data mylib.data1(keep=type date value value_avg) mylib.data2;
    set work.have;
    by type;
    if value ne . then NotMissing=1; else NotMissing=0;
    if first.type then call missing(of value_avg);
        value_avg+value;
    if first.type then call missing(of num_per_cat);
        num_per_cat+NotMissing;
    Avg=divide((value_avg+value),(num_per_cat+NotMissing));
    if last.type then output mylib.data1;
run;

This was successful for me with calculating averages, but I have no idea how to extract the row with the highest value in the "value" column to a second data set.


Solution

  • data work.have;
      input type $ date DATE9. value;
    datalines;
    A 08JUL2021 .
    A 09JUL2021 20
    A 20JUL2021 55
    A 20JUL2021 2
    B 02JUL2021 9
    B 22JUL2021 6
    B 04JUL2021 8
    B 07JUL2021 406
    C 01JUL2021 215
    C 28JUL2021 63
    C 30JUL2021 78
    C 21JUL2021 80
    D 18JUL2021 951
    D 09JUL2021 .
    D 14JUL2021 54
    D 08JUL2021 73
    ;
    
    proc summary data = have nway;
       class type;
       var value;
       output out = want_mean(drop = _:) mean = ;
    run;
    
    proc summary data = have nway;
       class type;
       var value;
       output out = want_max(drop = _:) max = ;
    run;