I'm trying to read a gzip jsonl file using scan_ndjson()
. However, I encountered the error
"TypeError: expected str, bytes or os.PathLike object, not int"
despite passing a string. read_ndjson()
works fine but I have memory issues so using a LazyFrame would be helpful.
Here's what I'm trying to do:
import gzip
with gzip.open(file_path, 'rb') as file:
df = pl.scan_ndjson(file.read(), ignore_errors=True))
Since scan_ndjson() only accepts filepaths, I created a temporary file.
import tempfile
import shutil
temporary_file = tempfile.NamedTemporaryFile(mode='w+', delete=False)
with gzip.open(file_path, 'rb') as file_in:
shutil.copyfileobj(file_in, temporary_file)
pl.scan_ndjson(
temporary_file.name,
ignore_errors=True,
low_memory=True,
rechunk=True
)
temporary_file.close()
os.unlink(temporary_file.name)