rdummy-dataupsetr

R Convert categorical data to dummy set by other variable


I have this data set, I put a screenshot of real data instead of a code or something. sorry for messing up, I am a newbie here in R enter image description here

Then, I want to change the data into dummy set for "13 Source" categorical data, but it has to be summarized by "HH No". Which will look like this enter image description here I've tried to use to.dummy by varhandle, model.matrix but ended up messy dataset. Could anybody help me how to deal with this? Thanks a million in advance


Solution

  • There are a number of ways to make dummy variables from factors - here is one way to create a summary presence table.

    Assume df is your data frame. You can use xtabs to start with, which will create a frequency table from your 2 columns.

    By comparing to see if your values are > 0, you will get TRUE if > 0, and FALSE otherwise. Adding 0 at the end will make TRUE the number 1 and FALSE the number 0.

    (xtabs(~ HH_No + Source, df) > 0) + 0
    

    Output

         Source
    HH_No Deep_well Rainwater
        1         1         1
        3         1         1
        4         0         1
    

    Data

    df <- structure(list(HH_No = c(1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 
    3, 3, 4, 4), Source = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 
    2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L), .Label = c("Deep_well", 
    "Rainwater"), class = "factor")), class = "data.frame", row.names = c(NA, 
    -16L))