I have a dataset with many variables. For a number of observations I want to replace their values (in a large number of variables) with the values of another observation (for the same variables) based on a unique identifier. The unique identifier is not equal to the Stata observation numbers (_n).
This is similar to several other threads, but as far as I can tell, their solutions are not fully transferrable
Here is a piece of code based on the solution from the second link but adjusted to my problem (I changed the content of the square bracket in the last macro). I am aware that this code does not work because the square bracket here could only contain a Stata obs number (_n). But I think this should illustrate well, what I am trying to achieve:
In this example, I would like to replace the values of the observations whose unique_id is 25 and 38 with the values of the observation whose unique_id is 21. This should be done for all variables in my local varlist.
set obs 50
local vlist v1 v2 v3 v4 v5
foreach v of local vlist {
generate `v' = runiform()
}
gen unique_id=_n+20 // this is just to illustrate that the unique_id is not equal to _n
foreach var of local vlist {
replace `var' =`var'[unique_id==21] if unique_id==25 | unique_id==38
}
Here is one way to do it:
foreach var of local vlist {
su `var' if unique_id == 21, meanonly
replace `var' = r(mean) if unique_id==25 | unique_id==38
}
And here is another way to do it:
gen reference = unique_id == 21
sort reference
foreach var of local vlist {
replace `var' = `var'[_N] if unique_id==25 | unique_id==38
}
Of course if you looked at the data and worked out that at the moment identifier 21 is in observation 42, say, then using that as subscript that is a direct answer, but not good or robust style.
See also this paper for a review of some techniques in this territory.
I'd sympathise with anyone who thought that Stata style looks awkward for this problem. Also, this kind of question seems terribly ad hoc, but then problems like this are real too.