pythonjsonpython-polars

Error while reading a JSON file with Python Polars


I am trying to read a GeoJSON with Python Polars, like this:

import polars as pl
myfile = '{"type":"GeometryCollection","geometries":[{"type":"Linestring","coordinates":[[10,11.2],[10.5,11.9]]},{"type":"Point","coordinates":[10,20]}]}'
pl.read_json(myfile) 

The error I get is:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...\local-packages\Python39\site-packages\polars\functions.py", line 631, in read_json    return DataFrame.read_json(source)  # type: ignore
  File "...\local-packages\Python39\site-packages\polars\frame.py", line 346, in read_json    
    self._df = PyDataFrame.read_json(file)
RuntimeError: Other("Error(\"missing field `columns`\", line: 1, column: 143)")

I have also tried to put the same content into a file and I had a similar error.

As suggested in GitHub, I tried to read the file via Pandas, like this:

import pandas as pd
initial_df = pl.from_pandas(pd.read_json(file_path))

The error I get is:

File "...\file_splitter.py", line 13, in split_file
    initial_df = pl.from_pandas(pd.read_json(file_path))
  File "...\local-packages\Python39\site-packages\polars\functions.py", line 566, in from_pandas
    data[name] = _from_pandas_helper(s)
  File "...\local-packages\Python39\site-packages\polars\functions.py", line 534, in _from_pandas_helper
    return pa.array(a)
  File "pyarrow\array.pxi", line 302, in pyarrow.lib.array
  File "pyarrow\array.pxi", line 83, in pyarrow.lib._ndarray_to_array
  File "pyarrow\error.pxi", line 97, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: cannot mix list and non-list, non-null values

What can I do to read the GeoJSON file?


Solution

  • Update: The example now works in Polars as expected.

    pl.read_json(myfile.encode())
    
    shape: (1, 2)
    ┌────────────────────┬─────────────────────────────────┐
    │ type               ┆ geometries                      │
    │ ---                ┆ ---                             │
    │ str                ┆ list[struct[2]]                 │
    ╞════════════════════╪═════════════════════════════════╡
    │ GeometryCollection ┆ [{"Linestring",[[10.0, 11.2], … │
    └────────────────────┴─────────────────────────────────┘
    

    If you read the file with pandas you get columns of type Object where one is not known to Arrow (it could be anything).

    If we cast the columns to type string we know that arrow and polars can deal with it.

    myfile = '{"type":"GeometryCollection","geometries":[{"type":"Linestring","coordinates":[[10,11.2],[10.5,11.9]]},{"type":"Point","coordinates":[10,20]}]}'
    print(pl.from_pandas(pd.read_json(myfile).astype(str)))
    
    shape: (2, 2)
    ┌────────────────────┬─────────────────────────────────────┐
    │ type               ┆ geometries                          │
    │ ---                ┆ ---                                 │
    │ str                ┆ str                                 │
    ╞════════════════════╪═════════════════════════════════════╡
    │ GeometryCollection ┆ {'type': 'Linestring', 'coordina... │
    │ GeometryCollection ┆ {'type': 'Point', 'coordinates':... │
    └────────────────────┴─────────────────────────────────────┘