I am using cSplit to split a column into three separate columns. The separator is " / "
However, one of my fields has embedded the "/" separator. The third element of the third line was supposed to be and stay as "f/j" after the split.
When I try it in the following example, it creates an extra (fourth) column
name <- c("abc / efg / hij", "abc / abc / hij", "efg / efg / f/j", "abd / efj / hij")
y <- c(1,1.2,3.4, 5)
dt <- data.frame(name,y)
dt
dt <- cSplit(dt,"name","/", drop=FALSE)
dt
When I try it in my original data set, which has over 5,000 lines, it produces the following error:
Error in fread(x, sep[i], header = FALSE):
Expecting 3 cols, but line 2307 contains text after processing all cols. Try again with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep='/' and/or '\n' characters within unbalanced unescaped quotes has failed. If quote='' doesn't help, please file an issue to figure out if the logic could be improved.
You should be able to just set fixed = FALSE
:
cSplit(dt, "name", " / ", fixed = FALSE, drop = FALSE)
## name y name_1 name_2 name_3
## 1: abc / efg / hij 1.0 abc efg hij
## 2: abc / abc / hij 1.2 abc abc hij
## 3: efg / efg / f/j 3.4 efg efg f/j
## 4: abd / efj / hij 5.0 abd efj hij