I have a string variable in Stata for unique id codes with different structures where I want to i) replace a specific character at a specific position (erroneous data entry) and ii) keep the last part in a new variable that starts after some
To illustrate, some of my id's are in the following structure: 000XXXX000XXXX000001234560780912340567
where the X's are string characters which of some are wrong, for example there are numbers at the 4th X (position 7) which I want to replace with the correct character.
Regarding ii), the last part is a unique number sequence which I want to keep in a new variable, the problem is that the length of this sequence varies and thus does not start at the same position. It seems however that it always starts after "00"
which is where I assume makes sense to start it.
I have tried substr()
and subinstr()
without being able to solve it.
clear
set obs 1
gen whatever = "000XXXX000XXXX000001234560780912340567"
replace whatever = substr(whatever, 1, 6) + "Y" + substr(whatever, 7, .)
gen wanted = substr(whatever, strrpos(whatever, "00") + 2, .)
di whatever[1]
di wanted[1]
Results:
. di whatever[1]
000XXXYX000XXXX000001234560780912340567
. di wanted[1]
1234560780912340567