Thanks for reading. For a reserach project, I'm doing some text analysis. We are analyzing large texts (company reports) and I'm looking to count keyword frequencies within that text.
However, I have two lists of keywords, and I dont want to count the number of occurances of these words, but the number of times any two words from these lists appear within a certain distance from each other in the main text.
text <- c("The house is blue. The car is very big and red.")
words1 <- c("car", "house")
words2 <- c("blue", "red")
The desired functionality should, for example, return 1 for distance 3. (Number of any combinations in given distance.)
I know about the text_count
function from the stringb
package and kwic
from quantea
. However, thats not really what Im looking for.
Thanks, any help is appreciated.
The quanteda package has the function fcm()
that counts frequency of their co-occurrences.
require(quanteda)
txt <- c("The house is blue. The car is very big and red.")
toks <- tokens(txt) %>% tokens_tolower()
fcm(toks, window = 3, tri = FALSE)
#> Feature co-occurrence matrix of: 10 by 10 features.
#> features
#> features the house is blue . car very big and red
#> the 1 2 4 2 4 2 2 2 2 2
#> house 2 0 2 1 2 1 1 1 1 1
#> is 4 2 1 2 4 2 2 2 2 2
#> blue 2 1 2 0 2 1 1 1 1 1
#> . 4 2 4 2 1 2 2 2 2 2
#> car 2 1 2 1 2 0 1 1 1 1
#> very 2 1 2 1 2 1 0 1 1 1
#> big 2 1 2 1 2 1 1 0 1 1
#> and 2 1 2 1 2 1 1 1 0 1
#> red 2 1 2 1 2 1 1 1 1 0