rsplitstackshape

splitstackshape to split text based on different line separators \n for columns and observations


I have some text data which looks like:

> myData
                                                                                                                                                                keyColumn
1 \n\n\n\nCol1\n\nCol1 result.\n\n\n\nCol2\n\nResult col2.\n\n\n\nCol3\n\n-\n\n\n\nCol4\n\nresult col4\n\n\n\ncol5\n\n€result col5\n\n\n\nCol6\n\nresult col6\n\n\n\n\n\n

It is structured and I would like to split the text into columns. The structure is always the same. That is, given:

\n\n\n\nCol1\n\nCol1 result.

It follows that the column name should come after the 4 line separators and the result comes after the 2 line separators.

I have tried using the splitstackshape package.

library(splitstackshape)
cSplit(myData, splitCols = "keyColumn", sep = "\n\n\n\n", direction = "wide")

Which gives:

   keyColumn_01 keyColumn_02 keyColumn_03 keyColumn_04 keyColumn_05 keyColumn_06 keyColumn_07 keyColumn_08 keyColumn_09 keyColumn_10 keyColumn_11 keyColumn_12
1:         Col1 Col1 result.         Col2 Result col2.         Col3            -         Col4  result col4         col5 €result col5         Col6  result col6

Which isn't exactly what I wanted but close.

Expected output:

Col1           Col2            Col3     Col4          col5           Col6
Col1 result    Result col2      -      result col4   €result col5    result col6

Data:

myData <- structure(list(keyColumn = "\n\n\n\nCol1\n\nCol1 result.\n\n\n\nCol2\n\nResult col2.\n\n\n\nCol3\n\n-\n\n\n\nCol4\n\nresult col4\n\n\n\ncol5\n\n\200result col5\n\n\n\nCol6\n\nresult col6\n\n\n\n\n\n"), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                    -1L))

Solution

  • In base R you could do:

    read.csv(text=gsub("\n{2,}",",",gsub("\n{4}","\n",myData$keyColumn)), header = FALSE)
        V1           V2
    1 Col1 Col1 result.
    2 Col2 Result col2.
    3 Col3            -
    4 Col4  result col4
    5 col5 €result col5
    6 Col6  result col6
    

    From this, you could be able to transpose and format it the way you want