pythonazureazure-blob-storage

Python Azure SDK: Using list_blobs to get more than 5.000 results


I'm having trouble with the Python Azure SDK and haven't found anything, neither on Stack Overflow nor in the MSDN Forums.

I want to use Azure SDKs list_blobs() to get a list of blobs. There are more than 5,000 (which is the max_result).

If I take a look at the code in the SDK itself, then I see the following:

def list_blobs(self, container_name, prefix=None, marker=None,
               maxresults=None, include=None, delimiter=None):

The description for 'Marker' being:

marker:
    Optional. A string value that identifies the portion of
    the list to be returned with the next list operation.
    The operation returns a marker value within the response
    body if the list returned was not complete. The marker
    value may then be used in a subsequent call to request
    the next set of list items. The marker value is opaque
    to the client.

My problem is that I'm unaware on how to use the marker to get the next set of 5,000 results. If I try something like this:

blobs = blobservice.list_blobs(target_container, prefix= prefix)
print(blobs.marker)

Then the marker is always empty, which I assume is because list_blobs() already parses the blobs out of the response.

But if that is the case then, how do I actually use the marker in a meaningful way?


Solution

  • SDK returns the continuation token in a variable called next_marker. You should use that to get the next set of blobs. See the code below as an example. Here I'm listing 100 blobs from a container at a time:

    from azure import *
    from azure.storage import *
    
    blob_service = BlobService(account_name='<accountname>', account_key='<accountkey>')
    next_marker = None
    while True:
        blobs = blob_service.list_blobs('<containername>', maxresults=100, marker=next_marker)
        next_marker = blobs.next_marker
        print(next_marker)
        print(len(blobs))
        if next_marker is None:
            break
    print "done"
    

    P.S. The code above throws an exception on the last iteration. Not sure why. But it should give you an idea.