pythonpython-3.xgoogle-colaboratorygoogle-docsfile-read

How do I read a .docx file in google colab?


Im trying to read a docx file into google collab since my main computer with anaconda is gone for maintenance. I'm trying to use the python-docx module, but to my knowlege I cant just pip install python-docx in google collab

'''

import docx

def getText(filename):
    doc = docx.Document(filename)
    fullText = []
    for para in doc.paragraphs:
        fullText.append(para.text)
    return '\n'.join(fullText)

docxString = getText("week_8_document1.docx")

'''

any ideas?


Solution

  • try the following; hope it works:

    #Install python-docx
    !pip install python-docx #<-- Yes you can directly install in Colab
    
    #Import the tools
    import docx
    from google.colab import files
    
    uploaded = files.upload() #<-- Select the file you want to upload
    file_name = '[whatever your file is called here].docx' #<-- Change filename to your file
    doc = docx.Document(file_name)
    

    Once you have the doc loaded, you can access texts by paragraphs or tables etc. Good luck boss