pythonpandasfastparquet

Python: OSError: [Errno 22] Invalid argument, when trying to use pandas.read_parquert


I have this simple code

import pandas as pd

file = pd.read_parquet('file.rot',engine='fastparquet')

file.rot is a table of data (float numbers) with 5 columns

When I run it the error that appears is this

  File ~\miniconda3\Lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File c:\users\josé\onedrive\ambiente de trabalho\draft.py:10
    file = pd.read_parquet('file.rot',sep='=',engine='fastparquet')

  File ~\miniconda3\Lib\site-packages\pandas\io\parquet.py:667 in read_parquet
    return impl.read(

  File ~\miniconda3\Lib\site-packages\pandas\io\parquet.py:402 in read
    parquet_file = self.api.ParquetFile(path, **parquet_kwargs)

  File ~\miniconda3\Lib\site-packages\fastparquet\api.py:135 in __init__
    self._parse_header(fn, verify)

  File ~\miniconda3\Lib\site-packages\fastparquet\api.py:215 in _parse_header
    f.seek(-(head_size + 8), 2)

OSError: [Errno 22] Invalid argument

I don't know what I'm doing wrong, or if i did something wrong installing fastparquet on miniconda


Solution

  • For those interested, here is what actually happens when fastparquet tries to read a file as parquet. According to the parquet spec, the last four bytes of the file should be b"PAR1", and the four bytes before that gives you the size of the footer in bytes. You could pass verify=True to check for the magic bytes:

    >>> fastparquet.ParquetFile('file.rot', engine='fastparquet', verify=True)
    ParquetException: File parse failed
    

    This is not the default and not done by pandas. So, fastparquet has assumed the size given in the four bytes preceding, probably some random big number, and seek() on the file therefore fails, since the location inferred is outside the file.