sasdatastep

How to sort data using Data step in SAS


I want to sort data in SAS data step. What exactly I mean is: the work of proc sort should be done in data step. Is there any solution?


Solution

  • If you're looking for a data-step only solution, you can do the work of PROC SORT with a hash table. The caveat is that you need enough memory to do it.

    If you want to do a simple sort, you would load the hash table with the ordered:'yes' option and output it to a new table. By default, ordered:yes will sort the data in ascending order. You can specify descending as well.

    Simple Sort

    data _null_;
    
        /* Sets up PDV without loading the table */
        if(0) then set sashelp.class;
    
        /* Load sashelp.class into memory ordered by Height. Do not remove duplicates. */
        dcl hash sortit(dataset:'sashelp.class', ordered:'yes', multidata:'yes');
    
            sortit.defineKey('Height');     * Order by height;
            sortit.defineData(all:'yes');   * Keep all variables in the output dataset;
    
        sortit.defineDone();
    
        /* Output to a dataset called class_sorted */
        sortit.Output(dataset:'class_sorted');
    run;
    

    De-duping

    To remove duplicates, do the exact same operation, except remove the multidata option. In the below table, observations (8, 9) and (15, 16) are duplicates of each other. Observations 9 and 16 will be eliminated.

    data _null_;
    
        /* Sets up PDV without loading the table */
        if(0) then set sashelp.class;
    
        /* Load sashelp.class into memory ordered by Height. Do not keep duplicates. */
        dcl hash sortit(dataset:'sashelp.class', ordered:'yes');
    
            sortit.defineKey('Height');     * Order by height;
            sortit.defineData(all:'yes');   * Keep all variables in the output dataset;
        sortit.defineDone();
    
        /* Output to a dataset called class_sorted */
        sortit.Output(dataset:'class_sorted');
    run;