pythonspell-checking

How to detect incorrect spellings in a text file using Python?


I am doing an exercise where I have to find out what are the incorrect spellings present in the text dataset using Python. I have checked multiple blogs but all of them show how to autocorrect incorrect spellings. I don't want to autocorrect it, I just want to separate the incorrect spellings from the dataset.

Sample Dataset:

1. Kurtas for women
2. parti wear dresses
3. denim jeans
4. overcot

Expected Output:

1. parti wear dresses
2. overcot

Solution

  • By using , at each line, you can check if any of their words are unknown and if so, keep the line and write it to a new file. Eventually, you can also load_words (custom ones like Kurtas) to the dictionary in order to not be flagged as "misspeled".

    #pip install from spellchecker
    from spellchecker import SpellChecker
    
    sp = SpellChecker() #language="en" by default
    
    # add on more custom words if needed 
    sp.word_frequency.load_words(["Kurtas"])
        
    with (
        open("file.txt", "r") as in_f,
        open("newf.txt", "w") as out_f
    ):
        for l in in_f:
            if sp.unknown(l.split()):
                out_f.write(l)
    

    Output (newf.txt) :

    parti wear dresses
    overcot