stringreplacesassuffix

How do I remove the generational suffix from last name (if there is one) using SAS?


In my dataset, the last name (lname) occasionally has the generational suffix attached. Regarding the generational suffix:

I tried to think simple solutions first. I couldn't think of any using Excel because all of their string solutions require having a consistent position of the values to be removed.

In SAS, PARSE requires a delimiter, and TRIM requires a consistent position.

In the syntax I've attached are four different approaches I tried. None of them were successful, and I totally admit user error. I'm not familiar with any of them other than COMPRESS, and then only for removing blanks.

Is there a way I can make a new variable for last name that doesn't have the generational suffix attached?

Thank you so much!

This first piece applies to each of the my attempts.

data want;
    input id lname $ fname $;
    datalines;
        123456  Smith       John
        234567  SMITH       ANDREW
        345678  SmithJr     Alan
        456789  SMITHSR     SAM
        789012  smithiii    robert
        890123  smithIIII   william
        901234  Smith4th    Tim
        ;
run;

My attempts start here.

/* COMPRESS */
data want;
    set have;
    lname2 = compress(lname,'Jr');
    put string=;
run;

/* TRANWARD */
data want;
    set have;
    lname2 = tranwrd(lname,"Jr", "");
    lname2 = tranwrd(lname,"Sr", "");
    lname2 = tranwrd(lname,"III", "");
run;

/* PRXCHANGE */
data want;
    set have;
    lname2 = lname;
    lname2 = prxchange('s/(.*)(jr|sr|iii|iv)$/$1/i',1,trim(lname));
run;

/* PRXMATCH */
data want;
    set have;
    if prxmatch('/Jr|Sr|III/',lname) then lname2 = '';
run;

Solution

  • I think you're fine with your prxchange method, for me it's the most reliable and easy to maintain, I would just change 2 things:

    1. us the 'o' modifier to compile only once the regex
    2. Use a strip instead of a trim (strip is an equivalent of ltrim + rtrim)
    data want;
        set have;
        attrib lname2 format=$50.;
        lname2 = prxchange('s/(.*)(jr|sr|iii|iv)$/$1/oi', 1, strip(lname));
    run;