stringpositionstatasubstr

Replace string at position


I have a string variable in Stata for unique id codes with different structures where I want to i) replace a specific character at a specific position (erroneous data entry) and ii) keep the last part in a new variable that starts after some

To illustrate, some of my id's are in the following structure: 000XXXX000XXXX000001234560780912340567

where the X's are string characters which of some are wrong, for example there are numbers at the 4th X (position 7) which I want to replace with the correct character.

Regarding ii), the last part is a unique number sequence which I want to keep in a new variable, the problem is that the length of this sequence varies and thus does not start at the same position. It seems however that it always starts after "00" which is where I assume makes sense to start it.

I have tried substr() and subinstr() without being able to solve it.


Solution

  • clear 
    set obs 1 
    gen whatever = "000XXXX000XXXX000001234560780912340567"
    replace whatever = substr(whatever, 1, 6) + "Y" + substr(whatever, 7, .)
    
    gen wanted = substr(whatever, strrpos(whatever, "00") + 2, .)
    
    di whatever[1]
    di wanted[1]
    

    Results:

    . di whatever[1]
    000XXXYX000XXXX000001234560780912340567
    
    . di wanted[1]
    1234560780912340567