pythongoogle-drive-apigoogle-sheets-apigoogle-api-python-client

Troubleshooting Google Drive API Issue: Google cloud python function Fails to Detect PDF Files in Shared Folder


I developed a Python script that operates on the Google Cloud Platform. The script utilizes the Google Drive API and Google Sheet API to access a folder in Google Drive belong to a company , extract data from PDF files within that folder, and then transfer the extracted data to a Google Sheet.

To ensure proper functionality, I set up a service account and configured the necessary APIs. Additionally, I integrated a secret manager to link the function with Google Drive and Google Sheet.

I granted access to the drive folders by sharing them with the service account's email ID.

However, upon running the script, the Drive API failed to detect the PDF files within the shared folders. Surprisingly, the APIs did not return any error messages.

def list_files_in_folder(drive,folder_id):
  #print(folder_id)
  # List files in the specified folder
  query = f"parents = '{folder_id}'"
  files = []
  response = drive.files().list(q = query).execute()
  #print(f'response:{response}')
  files = response.get('files')
  #print(f'First page files: {files}')
  next_page_token = response.get('nextPageToken')

  while next_page_token:
    response = drive.files().list(q=query,nextPageToken=next_page_token).execute()
    files.extend(response.get('files'))
    next_page_token = response.get('nextPageToken')

  return files

In an attempt to troubleshoot the issue, I tested the script using an alternate Google Drive account, distinct from the company's original drive primary drive, which is accessed by multiple accounts. When I created a folder containing PDF files and shared it with the same service email ID, the script successfully accessed the folder contents without any errors.


Solution

  • From I'm using a shared drive, in this case, your script is required to be modified. So, please modify it as follows and test it again.

    Modified script:

    def list_files_in_folder(drive, folder_id):
        # print(folder_id)
        # List files in the specified folder
        query = f"'{folder_id}' in parents and trashed=false"
        files = []
        response = drive.files().list(
            q=query,
            pageSize=1000,
            supportsAllDrives=True,
            includeItemsFromAllDrives=True,
            corpora="allDrives"
        ).execute()
        # print(f'response:{response}')
        files = response.get('files')
        # print(f'First page files: {files}')
        next_page_token = response.get('nextPageToken')
    
        while next_page_token:
            response = drive.files().list(
                q=query,
                pageSize=1000,
                nextPageToken=next_page_token,
                supportsAllDrives=True,
                includeItemsFromAllDrives=True,
                corpora="allDrives"
            ).execute()
            files.extend(response.get('files'))
            next_page_token = response.get('nextPageToken')
    
        return files
    

    Note:

    Reference: