pythonexcelpandaspydrivegoogle-colaboratory

Load xlsx file from drive in colaboratory


How can I import MS-excel(.xlsx) file from google drive into colaboratory?

excel_file = drive.CreateFile({'id':'some id'})

does work(drive is a pydrive.drive.GoogleDrive object). But,

print excel_file.FetchContent()

returns None. And

excel_file.content()

throws:

TypeErrorTraceback (most recent call last) in () ----> 1 excel_file.content()

TypeError: '_io.BytesIO' object is not callable

My intent is (given some valid file 'id') to import it as an io object, which could be read by pandas read_excel(), and finally get a pandas dataframe out of it.


Solution

  • You'll want to use excel_file.GetContentFile to save the file locally. Then, you can use the Pandas read_excel method after you !pip install -q xlrd.

    Here's a full example: https://colab.research.google.com/notebook#fileId=1SU176zTQvhflodEzuiacNrzxFQ6fWeWC

    What I did in more detail:

    I created a new spreadsheet in sheets to be exported as an .xlsx file.

    Next, I exported it as an .xlsx file and uploaded again to Drive. The URL is: https://drive.google.com/open?id=1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM

    Note the file ID. In my case it's 1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM.

    Then, in Colab, I tweaked the Drive download snippet to download the file. The key bits are:

    file_id = '1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM'
    downloaded = drive.CreateFile({'id': file_id})
    downloaded.GetContentFile('exported.xlsx')
    

    Finally, to create a Pandas DataFrame:

    !pip install -q xlrd
    import pandas as pd
    df = pd.read_excel('exported.xlsx')
    df
    

    The !pip install... line installs the xlrd library, which is needed to read Excel files.