rubyregexdictionarynlpaspell

Character classes used in ffi-aspell


I am trying to use the ffi-aspell gem to spell check a text. In order to do that, it seems that I have to extract the words by myself. I am trying to do that by applying String#scan to the text with a regex, but it does not seem straightforward.

What is the easiest way to define the class of characters that may appear in an ffi-aspell dictionary of some language? I want to make it available not only for English, so things like /[a-zA-Z']/ for the character (or /[a-zA-Z']+/ the word) does not work. /[[:word:]]/ seems to capture characters that are not in the dictionary, such as numerals, and further does not match the apostrophe (single quote), which is frequently used in a word. Is there some documentation that defines the character set used in an ffi-aspell dictionary?


Solution

  • I guess it would be easier to scan ffi_aspell dictionary first for entries and just kinda Regexp#union uniques afterwards.