sas

How to determine number of rows in a dataset in the do loop


I have below SAS code

data original_data;
    input var1 $ var2;
    datalines;
A 12
B 19
C 23
D 40
;
run;
data original_data1;
    set original_data;
    do i = 1 to 4;
       new_val = var2 + 44;
    end;
run;

In the line do i = 1 to 4 I put 4 manually as the number of rows of original_data, however I wonder how can I put that information dynamically based on the number of rows in given dataset (e.g. nrow() function in R)


Solution

  • The SAS DATA step is an implicit loop. In the simple use case of a step with a single SET statement that reads one input dataset, the DATA step will automatically iterate through every row of the dataset. On each iteration, the SET statement executes and reads one row of data. On the final iteration, the SET statement will execute and attempt to read data but it will hit a (logical) end of file mark, and the step will complete.

    So this code:

    data original_data1;
        set original_data;
        do i = 1 to 4;
           new_val = var2 + 44;
        end;
    run;
    

    Does not make sense. For row #1 of the input dataset, it will execute the assignment statement four times. Then for row #2 of the dataset, it will execute the assignment statement four times, etc.

    To create a new variable, you only need to execute the assignment statement once (for each row). You can do that like:

    data want ;
      set original_data ;
      new_val = var2 + 44;
    run ;
    

    You don't need to tell SAS how many rows are in original_data. SAS just processes the data that is there, one row at a time, until there is no more data to process.

    The output datasets work.original_data1 and work.want will have the same data, other than the fact that work.orignal_datat1 will have an extra variable, i.