algorithmn-gramqwertytext-classification

Detecting random keyboard hits considering QWERTY keyboard layout


The winner of a recent Wikipedia vandalism detection competition suggests that detection could be improved by "detecting random keyboard hits considering QWERTY keyboard layout".

Example: woijf qoeoifwjf oiiwjf oiwj pfowjfoiwjfo oiwjfoewoh

Is there any software that does this already (preferably free and open source) ?

If not, is there an active FOSS project whose goal is to achieve this?

If not, how would you suggest to implement such a software?


Solution

  • If two bigrams in analyzed text are close in QWERTY terms but have near zero statistical frequency in English language (like pairs "fg" or "cd") then there is chance that random keyboard hits are involved. If more such pairs are found then chance increases greatly.

    If you want to take into account the use of both hands for bashing then test letters that are separated with another letter for QWERTY closeness, but two bigrams (or even trigrams) for bigram frequency. For example in text "flsjf" you would check F and S for QWERTY distance, but bigrams FL and LS (or trigram FLS) for frequency.