stataone-to-manyodk

Convert one to many with 2 digits


I am currently handling a data set in Stata generated through ODK, the open data kit. There is an option to answer questions with multiple answers. E.g. in my questionnaire "Which of these assets do you own?" and the interviewer tagged all the answers out of 20 options. This generated for me a string variable with contents such as

 "1 2 3 5 11 17 20"
 "3 4 8 9 11 14 15 18 20"
 "1 3 9 11"

As this is difficult to analyse for several hundred participants, I wanted to generate new variables creating a 1 or 0 for each of the answer options. For the variable hou_as I tried to generate the variables hou_as_1, hou_as_2 etc. with the following code:

foreach p in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 {
local P : subinstr local p "-" ""
gen byte hou_as_`P' = strpos(hou_as, "`p'") > 0
}

For the single digits this brings the problem that the variable hou_as_1 is also filled with a 1 if any of the 10 11 12 ... 19 is filled even if the option 1 was not chosen. Similarly hou_as_2 is filled when the option 2, 12 or 20 is checked. How can I avoid this issue?


Solution

  • You want 20 indicator or dummy variables. Note first that it's much easier to use forval to loop 1(1)20, e.g.

    forval j = 1/20 { 
        gen hou_as_`j' = 0
    } 
    

    initialises 20 such variables as 0.

    I think it's easier to loop over the words of your answer variables, words being here just whatever is separated by spaces. There are at most 20 words, and it is a little crude but likely to be fast enough to go

    forval j = 1/20 { 
        forval k = 1/20 { 
            replace hou_as_`j' = 1 if word(hou_as, `k') == "`j'" 
        }
    } 
    

    Let's put that together and try it out on your example:

    clear 
    input str42 hou_as 
     "1 2 3 5 11 17 20"
     "3 4 8 9 11 14 15 18 20"
     "1 3 9 11"
    end 
    
    forval j = 1/20 { 
        gen hou_as_`j' = 0
        forval k = 1/20 { 
            replace hou_as_`j' = 1 if word(hou_as, `k') == "`j'" 
        }
    } 
    

    Just to show that it worked:

    . list in 3 
    
         +----------------------------------------------------------------------------+
      3. |   hou_as | hou_as_1 | hou_as_2 | hou_as_3 | hou_as_4 | hou_as_5 | hou_as_6 |
         | 1 3 9 11 |        1 |        0 |        1 |        0 |        0 |        0 |
         |----------+----------+----------+----------+----------+----------+----------|
         | hou_as_7 | hou_as_8 | hou_as_9 | hou_a~10 | hou_a~11 | hou_a~12 | hou_a~13 |
         |        0 |        0 |        1 |        0 |        1 |        0 |        0 |
         |----------+----------+----------+----------+----------+----------+----------|
         | hou_a~14 | hou_a~15 | hou_a~16 | hou_a~17 | hou_a~18 | hou_a~19 | hou_a~20 |
         |        0 |        0 |        0 |        0 |        0 |        0 |        0 |
         +----------------------------------------------------------------------------+
    

    Incidentally, your line

    local P : subinstr local p "-" ""
    

    does nothing useful. The local macro p only ever has contents which are integer digits, so there is no punctuation at all to remove.

    See also this explanation and

    . search multiple responses, sj
    
    Search of official help files, FAQs, Examples, SJs, and STBs
    
    SJ-5-1  st0082  . . . . . . . . . . . . . . . Tabulation of multiple responses
            (help _mrsvmat, mrgraph, mrtab if installed)  . . . . . . . .  B. Jann
            Q1/05   SJ 5(1):92--122
            introduces new commands for the computation of one- and
            two-way tables of multiple responses
    
    SJ-3-1  pr0008   Speaking Stata: On structure & shape: the case of mult. resp.
            . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox & U. Kohler
            Q1/03   SJ 3(1):81--99                                   (no commands)
            discussion of data manipulations for multiple response data