Extracting all the 16 digit account numbers from a text field and creating a field for that account number

I have a dataset with a text field that contains long free flowing text values , I need to identify and extract all the 16 digit account number from that text field and create a column from those extracted account number

Data that I have

    input acct_num   txt_field  ;
    DATALINES; 
    3435436     Payment issue reported 3456123789065322 to 0909876789432123 dated 9 mar 2024  
    7789976     Data declined and assigned to 7890512323454545

Data that I need

acct_num              txt_field                                                                       acct1               acct2
    3435436     Payment issue reported 3456123789065322 to 0909876789432123 dated 9 mar 2024   3456123789065322    0909876789432123
    7789976     Data declined and assigned to 7890512323454545                                 7890512323454545

As of now I have used Prxparse and prxmatch functions but those work when you know what exactly to look for in the text field, here I am just looking for any 16 digit values

Solution

You're on the right track with regex. Use call prxnext() to iterate through all instances of 16-digit account numbers. The regex \b\d{16}\b will find these.

data want;
    set have;
    length acct_num_16 $200.;
    retain exprid;
   
    /* Generate an expression ID */
    if(_N_ = 1) then exprid = prxparse('/\b\d{16}\b/');

    /* Scan all of the text */
    start = 1;
    stop  = length(txt_field);

    /* Find the first value */
    call prxnext(exprid, start, stop, txt_field, pos, len);

    /* Keep scanning until there are no more account numbers found and 
       append it to a comma-separated list */
    do while (pos > 0);
        acct_num_16 = catx(',', acct_num_16, substr(txt_field, pos, len) );
        call prxnext(exprid, start, stop, txt_field, pos, len);
    end;

    keep acct_num txt_field acct_num_16;
run;

acct_num    txt_field                    acct_num_16
3435436     Payment issue reported ...   3456123789065322,0909876789432123
7789976     Data declined and ...        7890512323454545