hunspell

Modify the affix file to accept spelling variations


In Hunspell, I have a .dic file that contains all correct words, for example:

and

That correct word is suggested if the user types something like this:

adn

Let's assume there is a language that allows the use of "d" instead of "n" and "n" instead of "d". In such a case, I could simply add two words in the .dic file, for example:

and
adn

But is there any other way to achieve this by keeping only a single word "and" in the .dic file and modifying the .aff file?

I need "and" and "adn" both to be considered as correct words without adding them to .dic file. There should be only 1 word in dic and a rule in affix file to achieve this. I checked REP and MAP tags but they are used for suggestions.

Is this possible in hunspell?


Solution

  • you can try using the TRY and KEY options in the .aff file to influence how Hunspell treats certain characters and suggests corrections. Although these options are typically used for suggestions rather than direct substitutions, they might provide a partial solution by prioritizing certain corrections.

    TRY: If you frequently need to substitute 'd' for 'n' and vice versa, you can include these characters in the TRY option. KEY: This can help Hunspell suggest the correct word when the user types something incorrectly.

    Sample AFF file

    SET UTF-8
    
    TRY dn   ---> This tells Hunspell to prioritize trying 'd' and 'n' when looking for corrections.
    
    KEY d;n ---> This defines 'd' and 'n' as characters that are often mistyped for each other.
    
    ... more setting you gonna insert
    

    My Solution Given the limitations, a custom solution might be required. One approach is to preprocess the text or post-process Hunspell's output to handle the specific character substitutions you need.

    def preprocess_text(text):
        text = text.replace('d', '\x00') character
        text = text.replace('n', 'd')    
        text = text.replace('\x00', 'n')  
        return text
    
    # Example usage
    original_text = "adn"
    preprocessed_text = preprocess_text(original_text)
    print(preprocessed_text) 
    ....