rdataframedata-retrievalreformat

Retrieve numbers from a string in the columns of a data.frame


I have this 4 columns from a specific file format:

=1  =1  =1  =1
4G  4B  4d  2g
4E  8cL 4e  .
.   8BJ .   .
4F# 4A  4d  4dd
=2  =2  =2  =2
4G  4G  2d  4.b
4D  4F# .   .
.   .   .   8a
4E  4G  4B  4g

I want to convert it into the following data.frame:

    1   1   1   1
    4   4   4   2
    4   8   4   .
    .   8   .   .
    4   4   4   4
    2   2   2   2
    4   4   2   4
    4   4   .   .
    .   .   .   8
    4   4   4   4

I suppose there is one library to do that type of things. I've tried creating a function for that but is not working properly. Any contributions will be rewarded.


Solution

  • We can use parse_number from readr package.

    library(readr)
    library(dplyr)
    
    df %>%
      mutate_all(parse_number)
    
    
    #   V1 V2 V3 V4
    #1   1  1  1  1
    #2   4  4  4  2
    #3   4  8  4 NA
    #4  NA  8 NA NA
    #5   4  4  4  4
    #6   2  2  2  2
    #7   4  4  2  4
    #8   4  4 NA NA
    #9  NA NA NA  8
    #10  4  4  4  4
    

    We can also use lapply to apply the function to all columns

    df[] <- lapply(df, parse_number)