rtextdelimiterread.table

How to read text file with page break character in R


I am quite new to R. I have few text (.txt) files in a folder that have been converted from PDF with page break character (#12). I need to produce a data frame by reading these text files in R with condition that one row in R represents one PDF page. It means that every time there is a page break (\f), it will only then create a new row.

The problem is when the text file gets load into R, every new line became a new row and I do not want this. Please assist me on this. Thanks!

Some methods that I have tried are read.table and readLines.

As you can see in lines 273 & 293, there is \f, so I need whatever that comes after \f to be in a row (which represents a page)


Solution

  • Does something like this work?

    library(tidyverse)
    read_file("mytxt.txt") %>%
      str_split("␌") %>%
      unlist() %>%
      as_tibble_col("data")
    

    It just reads the file as raw text then splits afterwards. You may have to replace the splitting character with something else.