rsplitstackshapecsplit

cSplit does not work when a field has embedded the separator


I am using cSplit to split a column into three separate columns. The separator is " / "

However, one of my fields has embedded the "/" separator. The third element of the third line was supposed to be and stay as "f/j" after the split.

When I try it in the following example, it creates an extra (fourth) column

name <- c("abc / efg / hij", "abc / abc / hij", "efg / efg / f/j", "abd / efj / hij")
y <- c(1,1.2,3.4, 5)

dt <- data.frame(name,y)
dt
dt <- cSplit(dt,"name","/", drop=FALSE)
dt

When I try it in my original data set, which has over 5,000 lines, it produces the following error:

Error in fread(x, sep[i], header = FALSE):

Expecting 3 cols, but line 2307 contains text after processing all cols. Try again with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep='/' and/or '\n' characters within unbalanced unescaped quotes has failed. If quote='' doesn't help, please file an issue to figure out if the logic could be improved.


Solution

  • You should be able to just set fixed = FALSE:

    cSplit(dt, "name", " / ", fixed = FALSE, drop = FALSE)
    ##               name   y name_1 name_2 name_3
    ## 1: abc / efg / hij 1.0    abc    efg    hij
    ## 2: abc / abc / hij 1.2    abc    abc    hij
    ## 3: efg / efg / f/j 3.4    efg    efg    f/j
    ## 4: abd / efj / hij 5.0    abd    efj    hij