sas

Add a flag based on another dataset


Suppose I have a dataset "DB1" with x columns (variables). You have a variable named "my_identifiers". Then you have another dataset "DB2" with one variable named "my_identifiers_subset". Then if the values of "my_identifiers_subset" from DB2 are in "my_identifiers" from DB1 you add a column flag in DB1 containing 1 otherwise 0.

I know how to deal with this when the comparison involves the same dataset but I don't know how to deal with this, when comparing with an external dataset.

          data DB1;
          input ID$ my_identifiers$;
          cards;
          1 345 
          1 45 
          2 678
          3 432
          3 432
          4 7
          ..........
          ;


          data DB2;
          input my_identifiers_subset$;
          cards;
          345  
          432
          44
          ..........
          ;

Desired output:

          data DB3;
          input ID$ my_identifiers$ Index$;
          cards;
          1 345 1
          1 45  0
          2 678 0
          3 432 1
          3 432 1
          4 7   0
          ..........
          ;

Solution

  • So using a proc sql and joining the 2 datasets might be the easiest solution. You have a couple of different options.

    First, you could use an EXISTS with a sub-query to determine whether DB1.my_identifiers is in db2.my_identifiers_subset. Or you could do a LEFT JOIN to achieve the same affect. The only caveat with the 2nd solution is is db2 might have multiple instances of the same my_identifiers_subset, in which case you'd get extra rows.

    PROC SQL;
    
        CREATE TABLE DB3_EXISTS AS
        SELECT *, 
            CASE
                WHEN EXISTS (SELECT * FROM DB2 WHERE DB2.my_identifiers_subset = DB1.my_identifiers) 
                THEN 1 
                ELSE 0
            END AS INDEX
        FROM DB1
        ;
    
        CREATE TABLE DB3_LEFT_JOIN AS
        SELECT DB1.*,
            CASE 
                WHEN DB2.my_identifiers_subset IS NULL 
                THEN 0 
                ELSE 1 
            END AS INDEX
        FROM DB1
            LEFT JOIN DB2 ON DB2.my_identifiers_subset = DB1.my_identifiers
        ORDER BY DB1.ID
        ;
    
    QUIT;