textsasbinarycharacter

Detect word in character variable text string and create variable based upon presence of that word SAS


Hello and sorry for the long title name! I am working with some data that has a long text string (some observations have up to ~2000 characters). Within these strings could be a word (AB/CD) that could be anywhere within the string. I am trying to detect AB/CD within the text string and create a binary variable (ABCD_present) if the word appears in the text.

Below is some example data

data test;
length status $175;
infile datalines dsd dlm="|" truncover;
input ID Status$;

datalines;
1|This is example text I am using instead of real data. I am making the length of this text longer to mimic the long text strings of my data AB/CD
2|This is example AB/CD text I am using instead of real data. I am making the length of this text longer to mimic the long text strings of my data
3|This is example text I am using instead of real data. I AB/CD am making the length of this text longer to mimic the long text strings of my data
4|This is example text I am using instead of real data. I am making the length of this text longer to mimic the long text strings of my data
5|This is example text I am using instead of real data. I am making the length of this text longer to mimic the long text strings of my data
6|This is example text I am using instead of real data. I am making the length of this text longer to AB/CD mimic the long text strings of my data

;
run;

Any guidance on this would be lovely! I do not have a ton of experience using long text strings.

Thank you in advance


Solution

  • You can use the find function.

    data want;
        set test;
        flag_abcd = (find(status, 'AB/CD') > 0);
    run;
    
    Status ID   flag_abcd
    ...    1    1
    ...    2    1
    ...    3    1
    ...    4    0
    ...    5    0
    ...    6    1