In my dataset, the last name (lname) occasionally has the generational suffix attached. Regarding the generational suffix:
I tried to think simple solutions first. I couldn't think of any using Excel because all of their string solutions require having a consistent position of the values to be removed.
In SAS, PARSE requires a delimiter, and TRIM requires a consistent position.
In the syntax I've attached are four different approaches I tried. None of them were successful, and I totally admit user error. I'm not familiar with any of them other than COMPRESS, and then only for removing blanks.
Is there a way I can make a new variable for last name that doesn't have the generational suffix attached?
Thank you so much!
This first piece applies to each of the my attempts.
data want;
input id lname $ fname $;
datalines;
123456 Smith John
234567 SMITH ANDREW
345678 SmithJr Alan
456789 SMITHSR SAM
789012 smithiii robert
890123 smithIIII william
901234 Smith4th Tim
;
run;
My attempts start here.
/* COMPRESS */
data want;
set have;
lname2 = compress(lname,'Jr');
put string=;
run;
/* TRANWARD */
data want;
set have;
lname2 = tranwrd(lname,"Jr", "");
lname2 = tranwrd(lname,"Sr", "");
lname2 = tranwrd(lname,"III", "");
run;
/* PRXCHANGE */
data want;
set have;
lname2 = lname;
lname2 = prxchange('s/(.*)(jr|sr|iii|iv)$/$1/i',1,trim(lname));
run;
/* PRXMATCH */
data want;
set have;
if prxmatch('/Jr|Sr|III/',lname) then lname2 = '';
run;
I think you're fine with your prxchange method, for me it's the most reliable and easy to maintain, I would just change 2 things:
data want;
set have;
attrib lname2 format=$50.;
lname2 = prxchange('s/(.*)(jr|sr|iii|iv)$/$1/oi', 1, strip(lname));
run;