pythongoogle-cloud-platformgoogle-cloud-storage

How do I list all the top-level folders in given GCS bucket?


I start with

    client = storage.Client()
    bucket = client.get_bucket(BUCKET_NAME)

    <what's next? Need something like client.list_folders(path)>

I know how to:

  1. list all the blobs (including blobs in sub-sub-sub-folders, of any depth) with bucket.list_blobs()

  2. or how to list all the blobs recursively in given folder with bucket.list_blobs(prefix=<path to subfolder>)

but what if my file system structure has 100 top level folders, each having thousands of files. Any efficient way to get only those 100 top level folder names without listing all the inside blobs?


Solution

  • I do not think you can get the 100 top level folders without listing all the inside blobs. Google Cloud Storage does not have folders or subdirectories, the library just creates an illusion of a hierarchical file tree.

    I used this simple code :

    from google.cloud import storage
    storage_client = storage.Client()
    blobs = storage_client.list_blobs('my-project')
    res = []
    
    for blob in blobs:
       if blob.name.split('/')[0] not in res:
           res.append(blob.name.split('/')[0]) 
    
    print(res)