I wanted to follow up on the question that I posted here. While I received baseR
and data.table
solution, I was trying to implement the same using cSplit_e
from splitstackshape package as suggested in the comment of my previous post. With the modified data as below (i.e. with NA),
data1<-structure(list(reason = c("1", "1", NA, "1", "1", "4 5", "1",
"1", "1", "1", "1", "1 2 3 4", "1 2 5", NA, NA)), .Names = "reason", class = "data.frame", row.names = c(NA,
-15L))
#loading packages
library(data.table)
library(splitstackshape)
cSplit_e(setDT(data1),1," ",mode = "value") # with NA's doesn't work
Error in seq.default(min(vec), max(vec)) : 'from' must be a finite number
data2<-na.omit(setDT(data1),cols="reason") # removing NA's
cSplit_e(data2,1," ",mode = "value") # without NA's works
reason reason_1 reason_2 reason_3 reason_4 reason_5
1: 1 1 NA NA NA NA
2: 1 1 NA NA NA NA
3: 1 1 NA NA NA NA
4: 1 1 NA NA NA NA
5: 4 5 NA NA NA 4 5
6: 1 1 NA NA NA NA
7: 1 1 NA NA NA NA
8: 1 1 NA NA NA NA
9: 1 1 NA NA NA NA
10: 1 1 NA NA NA NA
11: 1 2 3 4 1 2 3 4 NA
12: 1 2 5 1 2 NA NA 5
So, the question is does cSplit_e
account for NA's in column to be splited?
This has been fixed in the bugfix release (v1.4.4) of "splitstackshape". Thanks for reporting it.
After using update.packages()
, you should be able to do:
packageVersion("splitstackshape")
## [1] ‘1.4.4’
cSplit_e(data1, 1, " ", mode = "value")
## reason reason_1 reason_2 reason_3 reason_4 reason_5
## 1 1 1 NA NA NA NA
## 2 1 1 NA NA NA NA
## 3 <NA> NA NA NA NA NA
## 4 1 1 NA NA NA NA
## 5 1 1 NA NA NA NA
## 6 4 5 NA NA NA 4 5
## 7 1 1 NA NA NA NA
## 8 1 1 NA NA NA NA
## 9 1 1 NA NA NA NA
## 10 1 1 NA NA NA NA
## 11 1 1 NA NA NA NA
## 12 1 2 3 4 1 2 3 4 NA
## 13 1 2 5 1 2 NA NA 5
## 14 <NA> NA NA NA NA NA
## 15 <NA> NA NA NA NA NA
Note that 1.4.4 has moved "data.table" from "depends" to "imports".