perlvbatext-filesdocxdoc

Convert Word doc or docx files into text files?


I need a way to convert .doc or .docx extensions to .txt without installing anything. I also don't want to have to manually open Word to do this obviously. As long as it's running on auto.

I was thinking that either Perl or VBA could do the trick, but I can't find anything online for either.

Any suggestions?


Solution

  • Note that an excellent source of information for Microsoft Office applications is the Object Browser. You can access it via ToolsMacroVisual Basic Editor. Once you are in the editor, hit F2 to browse the interfaces, methods, and properties provided by Microsoft Office applications.

    Here is an example using Win32::OLE:

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    use File::Spec::Functions qw( catfile );
    
    use Win32::OLE;
    use Win32::OLE::Const 'Microsoft Word';
    $Win32::OLE::Warn = 3;
    
    my $word = get_word();
    $word->{Visible} = 0;
    
    my $doc = $word->{Documents}->Open(catfile $ENV{TEMP}, 'test.docx');
    
    $doc->SaveAs(
        catfile($ENV{TEMP}, 'test.txt'),
        wdFormatTextLineBreaks
    );
    
    $doc->Close(0);
    
    sub get_word {
        my $word;
        eval {
            $word = Win32::OLE->GetActiveObject('Word.Application');
        };
    
        die "$@\n" if $@;
    
        unless(defined $word) {
            $word = Win32::OLE->new('Word.Application', sub { $_[0]->Quit })
                or die "Oops, cannot start Word: ",
                       Win32::OLE->LastError, "\n";
        }
        return $word;
    }
    __END__