pythoncomms-wordword-2007win32com

How can I use Microsoft Word's spelling/grammar checker programmatically?


I want to process a medium to large number of text snippets using a spelling/grammar checker to get a rough approximation and ranking of their "quality." Speed is not really of concern either, so I think the easiest way is to write a script that passes off the snippets to Microsoft Word (2007) and runs its spelling and grammar checker on them.

Is there a way to do this from a script (specifically, Python)? What is a good resource for learning about controlling Word programmatically?

If not, I suppose I can try something from Open Source Grammar Checker (SO).

Update

In response to Chris' answer, is there at least a way to a) open a file (containing the snippet(s)), b) run a VBA script from inside Word that calls the spelling and grammar checker, and c) return some indication of the "score" of the snippet(s)?

Update 2

I've added an answer which seems to work, but if anyone has other suggestions I'll keep this question open for some time.


Solution

  • It took some digging, but I think I found a useful solution. Following the advice at http://www.nabble.com/Edit-a-Word-document-programmatically-td19974320.html I'm using the win32com module (if the SourceForge link doesn't work, according to this Stack Overflow answer you can use pip to get the module), which allows access to Word's COM objects. The following code demonstrates this nicely:

    import win32com.client, os
    
    wdDoNotSaveChanges = 0
    path = os.path.abspath('snippet.txt')
    
    snippet = 'Jon Skeet lieks ponies.  I can haz reputashunz?  '
    snippet += 'This is a correct sentence.'
    file = open(path, 'w')
    file.write(snippet)
    file.close()
    
    app = win32com.client.gencache.EnsureDispatch('Word.Application')
    doc = app.Documents.Open(path)
    print "Grammar: %d" % (doc.GrammaticalErrors.Count,)
    print "Spelling: %d" % (doc.SpellingErrors.Count,)
    
    app.Quit(wdDoNotSaveChanges)
    

    which produces

    Grammar: 2
    Spelling: 3
    

    which match the results when invoking the check manually from Word.