Suppose I have a dataset "DB1" with x columns (variables). You have a variable named "my_identifiers". Then you have another dataset "DB2" with one variable named "my_identifiers_subset". Then if the values of "my_identifiers_subset" from DB2 are in "my_identifiers" from DB1 you add a column flag in DB1 containing 1 otherwise 0.
I know how to deal with this when the comparison involves the same dataset but I don't know how to deal with this, when comparing with an external dataset.
data DB1;
input ID$ my_identifiers$;
cards;
1 345
1 45
2 678
3 432
3 432
4 7
..........
;
data DB2;
input my_identifiers_subset$;
cards;
345
432
44
..........
;
Desired output:
data DB3;
input ID$ my_identifiers$ Index$;
cards;
1 345 1
1 45 0
2 678 0
3 432 1
3 432 1
4 7 0
..........
;
So using a proc sql
and joining the 2 datasets might be the easiest solution. You have a couple of different options.
First, you could use an EXISTS
with a sub-query to determine whether DB1.my_identifiers
is in db2.my_identifiers_subset
. Or you could do a LEFT JOIN
to achieve the same affect. The only caveat with the 2nd solution is is db2
might have multiple instances of the same my_identifiers_subset
, in which case you'd get extra rows.
PROC SQL;
CREATE TABLE DB3_EXISTS AS
SELECT *,
CASE
WHEN EXISTS (SELECT * FROM DB2 WHERE DB2.my_identifiers_subset = DB1.my_identifiers)
THEN 1
ELSE 0
END AS INDEX
FROM DB1
;
CREATE TABLE DB3_LEFT_JOIN AS
SELECT DB1.*,
CASE
WHEN DB2.my_identifiers_subset IS NULL
THEN 0
ELSE 1
END AS INDEX
FROM DB1
LEFT JOIN DB2 ON DB2.my_identifiers_subset = DB1.my_identifiers
ORDER BY DB1.ID
;
QUIT;