daskdask-dataframe

Dask - "'coroutine' object is not iterable" trying to read parquet from S3


Using this code to read a multi-part parquet files with the prefix '/data/key' from a private s3 bucket, not from AWS

import dask as dd
dd.read_parquet(
    's3://ns1/data/key',
    storage_options={
        'key': 'key',
        'secret': 'secret',
        'client_kwargs': {'endpoint_url': 'https://s3.sample-private-cloud.com'}
    }
)

Why am I getting an error:

TypeError: 'coroutine' object is not iterable

I was able to download the file using boto3 client but unable to read it using dask. Dask documentation doesn't mention asynchronous process anywhere (await, async), so not sure why I am getting this error.


Solution

  • Using this code to read a multi-part parquet files with the prefix '/data/key'

    If you are trying to load all files with a prefix 'data/key', you should add a * at the end of the pattern, like this 'data/key*':

    import dask as dd
    dd.read_parquet(
        's3://ns1/data/key*',
        storage_options={
            'key': 'key',
            'secret': 'secret',
            'client_kwargs': {'endpoint_url': 'https://s3.sample-private-cloud.com'}
        }
    )