rreadlines

How to convert data in a character class object to dataframe in R


I basically have the same (unanswered) question as this fellow here -- I want to import the .raw data file at this link and convert the data within into a matrix/dataframe.

I downloaded redwt.raw to a folder, set my working directory to that folder, and followed the advice at the link above to import it via this code:

test <- readLines("redwt.raw", n = -1)

That works to create a character class object called test, where I can see my data. But I am not sure how to convert the data held within that object into a dataframe. The output should be a 2-column matrix; in it's current form, each odd column is an ID, and each even column is a corresponding sampling weight for the ID that came before it. So I'd like to loop through the numbers in the .raw file to create a ?x2 matrix dataframe.


Solution

  • That data looks really simple to import, many assumptions. Notable is that they are alternating pairs (and not rows of observations).

    If the data looks like this:

           1     437       2     437       3     437       4     437       5     437
           7     437       8     437       9     437      10     437      11     437
          12     437      13     437      14     437      15     437      16     437
          17     437      18     437      19     437      20     437      21     707
    

    Then we can do this:

    mydat <- readLines("https://adfdell.pstc.brown.edu/arisreds_data/public82/redwt.raw") |>
      # or `readLines("redwt.raw")` if already downloaded, no need to download repeatedly
      trimws() |>
      strsplit(" +") |>
      unlist() |>
      matrix(ncol=2, byrow=TRUE) |>
      data.frame()
    head(mydat)
    #   X1  X2
    # 1  1 437
    # 2  2 437
    # 3  3 437
    # 4  4 437
    # 5  5 437
    # 6  7 437