rdataframetidyversedummy-variable

How to construct dummy matrix with a list of data


The sample data is like this:

data1:

x1 x2 x3 x4
1 2 3 4
2 3 -1 -1
NA NA NA NA
0 0 0 0
1 -1 -1 -1
NA NA NA NA
4 3 -1 -1
0 0 0 0

data1[,1] means that data1[,1] belongs to group x1,x2,x3,x4.
-1 means that there is a blank. 0 means that the data does not belong to the corresponding group(i.e. if 0 is in x1, which means the datum does not belong to group 1.)
NA means missing data, where NA will randomly appear in the dataset.

Edit: For example, in 1st row, [1,2,3,4] means the first, second, third, and fourth columns. Therefore, in the 1st row of data2, the row will be [1,1,1,1].

In 1st row, [2,3,-1,-1] means the second and third columns, -1 means that there is a blank. Therefore, in the 1st row of data2, the row will be [0,1,1,0].

My expected outcome is :

data2:

x1 x2 x3 x4
1 1 1 1
0 1 1 0
NA NA NA NA
0 0 0 0
1 0 0 0
NA NA NA NA
0 0 1 1
0 0 0 0

My code is as below:

for (i in 1:8){
if(data1$x1[i] %in% c(0)) {
  data1[i,] = as.list(rep(0,4))
}
else if(is.na(data1$x1[i]))
  {data1[i,] = as.list(rep(NA,4))
}}


for (i in which(data1$x1 %nin% c(NA,0))){
  for (j in 1:4){
  if (data1[i,j]<15 & data1[i,j]>0){
      data1[i,j] =  m
      data1[i,m] = 1
    }
  }
}

#replace -1 to 0
data1[data1== -1] = 0

#This for loop creates dummy matrix

for (i in which(data1$x1%nin%c(NA,0))){
  m = data1[i,] 
  m = m[m>0] 
  for(j in 1:length(m)){
    data1[i,m] = 1
  }
}

#replace the number that greater than zero to zero
data1[data1>1] = 0

I wonder if there is any function can be used to replace forloop. Please give me some suggestion, thank you!

Update:

The solution that using purrr::map:

data1 = matrix(c(1,2,3,4,2,3,-1,-1,NA,NA,NA,NA,
             rep(0,4),1,-1,-1,-1,
             rep(NA,4),
             4,3,-1,-1,
             rep(0,4)),ncol = 4,byrow = T)

map(split(t(data1), rep(1:nrow(data1),each = ncol(data1))), 
            \(b){v = b[which(b>0|is.na(b))]
                 if(sum(is.na(v))==0){
                  b[setdiff(c(1:length(b)),v)] = 0
                  b[v] = 1} else{
                  b[which(is.na(v))] = NA  
                  b[which(!is.na(v))] = 1}
                 return(b)}) %>% do.call(rbind,.)

Solution

  • I am still not entirely sure of logic, but this might be helpful. Using apply you can evaluate each row independently.

    First, create a vector of NA. Then, where a value is greater than 1, set that element in the vector (column number) to 1.

    Second, if the vector has at least one 1 value, then change the others missing to 0.

    Third, if all elements are zero and no values are missing, then make all values in that row 0.

    The end result is a matrix in this example.

    t(apply(
      data1,
      MARGIN = 1,
      \(x) {
        vec <- rep(NA, length(x))
        vec[x[x > 0]] <- 1
        if (any(vec == 1, na.rm = T)) vec[is.na(vec)] <- 0
        if (any(!is.na(x)) & all(x == 0)) vec <- rep(0, length(x))
        vec
      }
    ))
    

    Output

         [,1] [,2] [,3] [,4]
    [1,]    1    1    1    1
    [2,]    0    1    1    0
    [3,]   NA   NA   NA   NA
    [4,]    0    0    0    0
    [5,]    1    0    0    0
    [6,]   NA   NA   NA   NA
    [7,]    0    0    1    1
    [8,]    0    0    0    0