I am attempting to conditionally parse numbers from text strings within a dataframe and then assign that parsed number to the corresponding row within the last column. The condition is grepl("apple", df$col1)
. Not every row will meet the condition, so the corresponding cell in the last column can be NA. Perhaps easier seen than explained:
col1 <- c("I have 1 apple", NA, "I have 2 apples", NA, "I have 3 apples", NA, "I have 4 apples")
col2 <- c(7:13)
df <- as.data.frame(cbind(col1, col2))
df$col3 <- NA
This gets close to my desired result:
df$col3 = unlist(apply(df$col1, readr::parse_number))
However, I want to only parse and assign to df$col3 the rows that meet the condition grepl("apple", df$col1)
because in my actual dataset, there are numbers within text strings that I do not want to parse. Is a solution an if_else with lapply
?
A solution with tidyverse
:
library(tidyverse)
col1 <- c("I have 1 apple", NA, "I have 2 apples", NA, "I have 3 apples", NA, "I have 4 pears")
col2 <- c(7:13)
df <- as.data.frame(cbind(col1, col2))
df %>%
mutate(col3 = ifelse(str_detect(col1, "apple"),
str_extract(col1, "\\d+"), NA))
#> col1 col2 col3
#> 1 I have 1 apple 7 1
#> 2 <NA> 8 <NA>
#> 3 I have 2 apples 9 2
#> 4 <NA> 10 <NA>
#> 5 I have 3 apples 11 3
#> 6 <NA> 12 <NA>
#> 7 I have 4 pears 13 <NA>
Created on 2024-12-31 with reprex v2.0.2
Function str_extract
applies regex pattern matching to extract a sequence of numbers from a string. If you have floating-point number with a dot the pattern can be expanded as "[\\d\\.]+"
.
If you know that apple(s)
always goes after the number you can use lookahead assertions in regex:
df %>%
mutate(col3 = str_extract(col1, "\\d+(?= apple)"))