rdataframegrepl

Extract only numbers followed by a + in text column and return numbers that follow the patter in a new column [R]


I have data like this:

sample_data <- data.frame(
  txtnumbers = c("text stuff +300.5","other stuff 40+ more stuff","text here -30 here too","30- text here","50+","stuff here 500+","400.5-" ),
  stringsAsFactors = F
)

I want to extract numbers where they are FOLLOWED by a + symbol and insert the values into a new column, ignoring the rest of the text and returning NA where there is no number followed by a +:

desired_data <- data.frame(
  txtnumbers = c("text stuff +300.5","other stuff 40+ more stuff","text here -30 here too","30- text here","50+","stuff here 500+","400.5-" ),
  desired_col = c(NA,40,NA,NA,50,500,NA),
  stringsAsFactors = F
)

Can someone help me with an efficient function to do this? I could parse the number using parse_numeric but returning only numbers followed by a + is giving me issues. Thanks!


Solution

  • Here is one option using stringr::str_extract

    stringr::str_extract(sample_data$txtnumbers, "(\\d+)\\+", group = 1)
    #[1] NA    "40"  NA    NA    "50"  "500" NA
    

    Right now, they are extracted as strings. You may wrap as.integer to turn them into numbers.