pythonazure

How to efficiently list all files in an Azure blob using python?


I need to list all files in an Azure blob using python. Currently I use the code below. this worked well when there were few files. But now I have a large number of files and the script runs more than an hour. The time-consuming part is the for loop. How can this be done faster?

import os, uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
import pandas as pd

connect_str = "************"

blob_service_client = BlobServiceCliaent.from_connection_string(connect_str)

blob_service_client.get_account_information()
c = blob_service_client.list_containers()

container_client = blob_service_client.get_container_client("blobName")

l = []
for blob in container_client.list_blobs():
    l.append(blob.name)

Solution

  • I could able to achieve this using list_blobs method of BlockBlobService. After reproducing from my end, I have observed that the list_blobs method of BlobServiceClient returns all the properties of blob which is taking more time to proocess whereas BlockBlobService returns objects. Below is the code that was working for me.

    import os
    from azure.storage.blob import BlockBlobService
    import datetime
    
    ACCOUNT_NAME = "<YOUR_ACCOUNT_NAME>"
    CONTAINER_NAME = "<YOUR_CONTAINER_NAME>"
    SAS_TOKEN='<YOUR_SAS_TOKEN>'
    
    block_blob_service = BlockBlobService(account_name=ACCOUNT_NAME,account_key=None,sas_token=SAS_TOKEN)
    
    # Lists All Blobs
    l =[]
    print("\nList blobs in the container")
    generator = block_blob_service.list_blobs(CONTAINER_NAME)
    for blob in generator:
        print("a"+str(datetime.datetime.now()))
        blobname=blob
        l.append(blob.name)
        
    print(l)
        
    print("b"+str(datetime.datetime.now()))
    

    OUTPUT:

    enter image description here