I am currently handling a data set in Stata generated through ODK, the open data kit. There is an option to answer questions with multiple answers. E.g. in my questionnaire "Which of these assets do you own?" and the interviewer tagged all the answers out of 20 options. This generated for me a string variable with contents such as
"1 2 3 5 11 17 20"
"3 4 8 9 11 14 15 18 20"
"1 3 9 11"
As this is difficult to analyse for several hundred participants, I wanted to generate new variables creating a 1 or 0 for each of the answer options.
For the variable hou_as
I tried to generate the variables hou_as_1
, hou_as_2
etc. with the following code:
foreach p in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 {
local P : subinstr local p "-" ""
gen byte hou_as_`P' = strpos(hou_as, "`p'") > 0
}
For the single digits this brings the problem that the variable hou_as_1
is also filled with a 1 if any of the 10 11 12 ... 19 is filled even if the option 1 was not chosen. Similarly hou_as_2
is filled when the option 2, 12 or 20 is checked.
How can I avoid this issue?
You want 20 indicator or dummy variables. Note first that it's much easier to use forval
to loop 1(1)20, e.g.
forval j = 1/20 {
gen hou_as_`j' = 0
}
initialises 20 such variables as 0.
I think it's easier to loop over the words of your answer variables, words being here just whatever is separated by spaces. There are at most 20 words, and it is a little crude but likely to be fast enough to go
forval j = 1/20 {
forval k = 1/20 {
replace hou_as_`j' = 1 if word(hou_as, `k') == "`j'"
}
}
Let's put that together and try it out on your example:
clear
input str42 hou_as
"1 2 3 5 11 17 20"
"3 4 8 9 11 14 15 18 20"
"1 3 9 11"
end
forval j = 1/20 {
gen hou_as_`j' = 0
forval k = 1/20 {
replace hou_as_`j' = 1 if word(hou_as, `k') == "`j'"
}
}
Just to show that it worked:
. list in 3
+----------------------------------------------------------------------------+
3. | hou_as | hou_as_1 | hou_as_2 | hou_as_3 | hou_as_4 | hou_as_5 | hou_as_6 |
| 1 3 9 11 | 1 | 0 | 1 | 0 | 0 | 0 |
|----------+----------+----------+----------+----------+----------+----------|
| hou_as_7 | hou_as_8 | hou_as_9 | hou_a~10 | hou_a~11 | hou_a~12 | hou_a~13 |
| 0 | 0 | 1 | 0 | 1 | 0 | 0 |
|----------+----------+----------+----------+----------+----------+----------|
| hou_a~14 | hou_a~15 | hou_a~16 | hou_a~17 | hou_a~18 | hou_a~19 | hou_a~20 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+----------------------------------------------------------------------------+
Incidentally, your line
local P : subinstr local p "-" ""
does nothing useful. The local macro p
only ever has contents which are integer digits, so there is no punctuation at all to remove.
See also this explanation and
. search multiple responses, sj
Search of official help files, FAQs, Examples, SJs, and STBs
SJ-5-1 st0082 . . . . . . . . . . . . . . . Tabulation of multiple responses
(help _mrsvmat, mrgraph, mrtab if installed) . . . . . . . . B. Jann
Q1/05 SJ 5(1):92--122
introduces new commands for the computation of one- and
two-way tables of multiple responses
SJ-3-1 pr0008 Speaking Stata: On structure & shape: the case of mult. resp.
. . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox & U. Kohler
Q1/03 SJ 3(1):81--99 (no commands)
discussion of data manipulations for multiple response data