I am trying to read a GeoJSON with Python Polars, like this:
import polars as pl
myfile = '{"type":"GeometryCollection","geometries":[{"type":"Linestring","coordinates":[[10,11.2],[10.5,11.9]]},{"type":"Point","coordinates":[10,20]}]}'
pl.read_json(myfile)
The error I get is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "...\local-packages\Python39\site-packages\polars\functions.py", line 631, in read_json return DataFrame.read_json(source) # type: ignore
File "...\local-packages\Python39\site-packages\polars\frame.py", line 346, in read_json
self._df = PyDataFrame.read_json(file)
RuntimeError: Other("Error(\"missing field `columns`\", line: 1, column: 143)")
I have also tried to put the same content into a file and I had a similar error.
As suggested in GitHub, I tried to read the file via Pandas, like this:
import pandas as pd
initial_df = pl.from_pandas(pd.read_json(file_path))
The error I get is:
File "...\file_splitter.py", line 13, in split_file
initial_df = pl.from_pandas(pd.read_json(file_path))
File "...\local-packages\Python39\site-packages\polars\functions.py", line 566, in from_pandas
data[name] = _from_pandas_helper(s)
File "...\local-packages\Python39\site-packages\polars\functions.py", line 534, in _from_pandas_helper
return pa.array(a)
File "pyarrow\array.pxi", line 302, in pyarrow.lib.array
File "pyarrow\array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow\error.pxi", line 97, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: cannot mix list and non-list, non-null values
What can I do to read the GeoJSON file?
Update: The example now works in Polars as expected.
pl.read_json(myfile.encode())
shape: (1, 2)
┌────────────────────┬─────────────────────────────────┐
│ type ┆ geometries │
│ --- ┆ --- │
│ str ┆ list[struct[2]] │
╞════════════════════╪═════════════════════════════════╡
│ GeometryCollection ┆ [{"Linestring",[[10.0, 11.2], … │
└────────────────────┴─────────────────────────────────┘
If you read the file with pandas you get columns of type Object
where one is not known to Arrow
(it could be anything).
If we cast the columns to type string we know that arrow and polars can deal with it.
myfile = '{"type":"GeometryCollection","geometries":[{"type":"Linestring","coordinates":[[10,11.2],[10.5,11.9]]},{"type":"Point","coordinates":[10,20]}]}'
print(pl.from_pandas(pd.read_json(myfile).astype(str)))
shape: (2, 2)
┌────────────────────┬─────────────────────────────────────┐
│ type ┆ geometries │
│ --- ┆ --- │
│ str ┆ str │
╞════════════════════╪═════════════════════════════════════╡
│ GeometryCollection ┆ {'type': 'Linestring', 'coordina... │
│ GeometryCollection ┆ {'type': 'Point', 'coordinates':... │
└────────────────────┴─────────────────────────────────────┘