I have unique IDs in rows where columns are the ID
s of their 'sent' friends. To get a count of 'received' friends, I need to get a count for how many times an ID appears across all columns and rows of the dataset. This is easy in R, but I'd like to stay with Stata for this project.
ID | F1_ID | F2_ID | F3_ID | ID_mentions |
---|---|---|---|---|
1 | 2 | 3 | 4 | 4 |
2 | 4 | 1 | 4 | |
3 | 1 | 2 | 3 | |
4 | 2 | 1 | 3 | 3 |
Toy data above. Here, there are four mentions of ID #1
, three mentions of ID #4
, etc.
I want to generate a variable containing the count of how many times each ID
value in the first column is mentioned in any column of the data set. This is illustrated in the ID_mentions
column.
Turns out I wrote something in this territory. You would need to install this with ssc install tab_chi
* Example generated by -dataex-. For more info, type help dataex
clear
input byte(id f1_id f2_id f3_id)
1 2 3 4
2 4 1 .
3 1 2 .
4 2 1 3
end
tabm *id
| values
variable | 1 2 3 4 | Total
-----------+--------------------------------------------+----------
ID | 1 1 1 1 | 4
F1_ID | 1 2 0 1 | 4
F2_ID | 2 1 1 0 | 4
F3_ID | 0 0 1 1 | 2
-----------+--------------------------------------------+----------
Total | 4 4 3 3 | 14
EDIT To count all mentions:
gen mentions = .
quietly forval i = 1/`=_N' {
egen work = anycount(*id), value(`=id[`i']')
su work, meanonly
replace mentions = r(sum) in `i'
drop work
}
list