The sample data is like this:
data1:
x1 | x2 | x3 | x4 |
---|---|---|---|
1 | 2 | 3 | 4 |
2 | 3 | -1 | -1 |
NA | NA | NA | NA |
0 | 0 | 0 | 0 |
1 | -1 | -1 | -1 |
NA | NA | NA | NA |
4 | 3 | -1 | -1 |
0 | 0 | 0 | 0 |
data1[,1]
means that data1[,1]
belongs to group x1,x2,x3,x4
.
-1
means that there is a blank.
0
means that the data does not belong to the corresponding group(i.e. if 0
is in x1
, which means the datum does not belong to group 1
.)
NA
means missing data, where NA
will randomly appear in the dataset.
Edit:
For example, in 1st row,
[1,2,3,4]
means the first, second, third, and fourth columns.
Therefore, in the 1st row of data2, the row will be
[1,1,1,1]
.
In 1st row,
[2,3,-1,-1]
means the second and third columns, -1
means that there is a blank.
Therefore, in the 1st row of data2, the row will be
[0,1,1,0]
.
My expected outcome is :
data2:
x1 | x2 | x3 | x4 |
---|---|---|---|
1 | 1 | 1 | 1 |
0 | 1 | 1 | 0 |
NA | NA | NA | NA |
0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 |
NA | NA | NA | NA |
0 | 0 | 1 | 1 |
0 | 0 | 0 | 0 |
My code is as below:
for (i in 1:8){
if(data1$x1[i] %in% c(0)) {
data1[i,] = as.list(rep(0,4))
}
else if(is.na(data1$x1[i]))
{data1[i,] = as.list(rep(NA,4))
}}
for (i in which(data1$x1 %nin% c(NA,0))){
for (j in 1:4){
if (data1[i,j]<15 & data1[i,j]>0){
data1[i,j] = m
data1[i,m] = 1
}
}
}
#replace -1 to 0
data1[data1== -1] = 0
#This for loop creates dummy matrix
for (i in which(data1$x1%nin%c(NA,0))){
m = data1[i,]
m = m[m>0]
for(j in 1:length(m)){
data1[i,m] = 1
}
}
#replace the number that greater than zero to zero
data1[data1>1] = 0
I wonder if there is any function can be used to replace forloop. Please give me some suggestion, thank you!
Update:
The solution that using purrr::map:
data1 = matrix(c(1,2,3,4,2,3,-1,-1,NA,NA,NA,NA,
rep(0,4),1,-1,-1,-1,
rep(NA,4),
4,3,-1,-1,
rep(0,4)),ncol = 4,byrow = T)
map(split(t(data1), rep(1:nrow(data1),each = ncol(data1))),
\(b){v = b[which(b>0|is.na(b))]
if(sum(is.na(v))==0){
b[setdiff(c(1:length(b)),v)] = 0
b[v] = 1} else{
b[which(is.na(v))] = NA
b[which(!is.na(v))] = 1}
return(b)}) %>% do.call(rbind,.)
I am still not entirely sure of logic, but this might be helpful. Using apply
you can evaluate each row independently.
First, create a vector of NA
. Then, where a value is greater than 1, set that element in the vector (column number) to 1.
Second, if the vector has at least one 1 value, then change the others missing to 0.
Third, if all elements are zero and no values are missing, then make all values in that row 0.
The end result is a matrix in this example.
t(apply(
data1,
MARGIN = 1,
\(x) {
vec <- rep(NA, length(x))
vec[x[x > 0]] <- 1
if (any(vec == 1, na.rm = T)) vec[is.na(vec)] <- 0
if (any(!is.na(x)) & all(x == 0)) vec <- rep(0, length(x))
vec
}
))
Output
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 0 1 1 0
[3,] NA NA NA NA
[4,] 0 0 0 0
[5,] 1 0 0 0
[6,] NA NA NA NA
[7,] 0 0 1 1
[8,] 0 0 0 0