
pyarrow.lib.ArrowIOError: Invalid Parquet file size is 0 bytes

I'm trying to do something like this, reading a list of files from an S3 bucket into a pyarrow table.

If I specify the filename I can do:

from pyarrow.parquet import ParquetDataset
import s3fs
dataset = ParquetDataset(

And everything works as expected. However if I do:

dataset = ParquetDataset(

I get:

pyarrow/_parquet.pyx:1036: in                                                                                                                                                                                                              
pyarrow.lib.ArrowIOError: Invalid Parquet file size is 0 bytes  


  • This happened to me because of empty "success" files that were at the same S3 prefix as my parquet files. I resolved this by first listing out the parquet files and selecting only those with names ending in ".parquet":

    from pyarrow.parquet import ParquetDataset
    import s3fs
    s3 = s3fs.S3FileSystem()
    paths = [path for path in"s3://path/to/file/") if path.endswith(".parquet")]
    dataset = ParquetDataset(paths, filesystem=s3)