rdplyrstringrfixed-width

best way to break up fixed width files in R


Below is the sample data.

 string1 <- c(320100000020220311020210)
 string2 <- c(320100000020220312120211)

 testitem <- data.frame (string1,string2)

My first question is what is the best method to break up the strings in a specified fashion. For example, the breaks should appear after characters 2,4,10,14,16,18,19, and 23. In this case, the end result should appear as such

 32    01    0000000    2022    03    11    0   2021    0

Solution

  • If those are character strings, then use read.fwf from base R

    read.fwf(textConnection(unlist(testitem)),
        widths = c(2, 2, 6, 4, 2, 2, 1, 4, 1), colClasses = "character")
    

    -output

      V1 V2     V3   V4 V5 V6 V7   V8 V9
    1 32 01 000000 2022 03 11  0 2021  0
    2 32 01 000000 2022 03 12  1 2021  1
    

    Or another option is separate

    library(stringr)
    library(tidyr)
    library(dplyr)
    tibble(col1 = unlist(testitem)) %>% 
      separate(col1, into = str_c("V", 1:9), sep = c(2, 4, 10, 14, 16, 18, 19, 23))
    # A tibble: 2 × 9
      V1    V2    V3     V4    V5    V6    V7    V8    V9   
      <chr> <chr> <chr>  <chr> <chr> <chr> <chr> <chr> <chr>
    1 32    01    000000 2022  03    11    0     2021  0    
    2 32    01    000000 2022  03    12    1     2021  1 
    

    data

    string1 <- c("320100000020220311020210")
    string2 <- c("320100000020220312120211")
    testitem <- data.frame (string1,string2)