rstringpattern-matchingstringr

Count how often words from a vector occur in a string


I have a string of text and a vector of words:

String: "Auch ein blindes Huhn findet einmal ein Korn."
Vector: "auch", "ein"

I want to check how often each word in the vector is contained in the string and calculate the sum of the frequencies. For the example, the correct result would be 3.

I have come so far as to be able to check which words occur in the string and calculate the sum:

library(stringr)
deu <- c("\\bauch\\b", "\\bein\\b")
str_detect(tolower("Auch ein blindes Huhn findet einmal ein Korn."), deu)

[1] TRUE TRUE

sum(str_detect(tolower("Auch ein blindes Huhn findet einmal ein Korn."), deu))

[1] 2

Unfortunately str_detect does not return the number of occurences (1, 2), but only whether a word occurs in a string (TRUE, TRUE), so the sum of the output from str_detect is not equal to the number of words.

Is there a function in R similar to preg_match_all in PHP?

preg_match_all("/\bauch\b|\bein\b/i", "Auch ein blindes Huhn findet einmal ein Korn.", $matches);
print_r($matches);

Array
(
    [0] => Array
        (
            [0] => Auch
            [1] => ein
            [2] => ein
        )

)

echo preg_match_all("/\bauch\b|\bein\b/i", "Auch ein blindes Huhn findet einmal ein Korn.", $matches);

3

I would like to avoid loops.


I have looked at a lot of similar questions, but they either don't count the number of occurrences or do not use a vector of patterns to search. I may have overlooked a question that answers mine, but before you mark this as duplicate, please make sure that the "duplicate" actually asks the exact same thing. Thank you.


Solution

  • You can use str_count like

    stringr::str_count(tolower("Auch ein blindes Huhn findet mal ein Korn"), paste0("\\b", tolower(c("ein","Huhn")), "\\b"))
    [1] 2 1