I have some big geojson (see California.geojson
has an example https://github.com/microsoft/USBuildingFootprints?tab=readme-ov-file#download-links). That I want to convert to a .csv
(faster to import in Postgres with \copy
). I am passing it to STDIN.
They are "type": "Polygon"
but for my needs this is mostly point data and I do not need the full geometry (and neither the attributes). jq
is perfect for this task.
Sadly, some of the biggest file seems to be big to be stored in memory (the process generated a "Killed" message). I tried the --stream
argument but I failed to understand it or the process seems to be very slow (more than 3 hours and still "running").
A sample can be produced thanks to (see at the bottom of this post to get a copy of it):
jq '.features = .features[:5]' data/Alabama.geojson > sample.geojson
This is working great for "smaller" geojson (< 1.4 GB):
jq '.features | map(.geometry.coordinates) | map(.[]) | map(first) | .[] | {"long": first, "lat": last} | [.long, .lat] | @csv' small.geojson
But I got a "Killed" message (I am assuming I am running out of memory)
Then I tried the --stream
, I am not sure that I understand it correctly (this post and this issue were a big help)
This is my version with --stream (lot of "hacking")
cat sample.geojson | jq --stream "fromstream(1|truncate_stream(inputs))" | jq ' map(.geometry.coordinates) | map(.[]) | map(first) | .[] | {"long": first, "lat": last} | [.long, .lat] | @csv'
It works against sample.geojson but failed on a big geojson (say 'Ohio.geojson'). Any Ideas?
I also tried to write in a file but without any more success.
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-84.959634,
32.421887
],
[
-84.95982,
32.421889
],
[
-84.959822,
32.421797
],
[
-84.959767,
32.421796
],
[
-84.959767,
32.421771
],
[
-84.959636,
32.421769
],
[
-84.959634,
32.421887
]
]
]
},
"properties": {
"release": 2,
"capture_dates_range": "3/26/2020-7/22/2020"
}
},
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-84.959636,
32.42095
],
[
-84.959715,
32.42095
],
[
-84.959714,
32.420984
],
[
-84.959816,
32.420985
],
[
-84.959818,
32.420849
],
[
-84.959637,
32.420848
],
[
-84.959636,
32.42095
]
]
]
},
"properties": {
"release": 2,
"capture_dates_range": "3/26/2020-7/22/2020"
}
},
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-84.959998,
32.235231
],
[
-84.959877,
32.235231
],
[
-84.959877,
32.235288
],
[
-84.959998,
32.235288
],
[
-84.959998,
32.235231
]
]
]
},
"properties": {
"release": 1,
"capture_dates_range": ""
}
},
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-84.960253,
32.422248
],
[
-84.960069,
32.422245
],
[
-84.960067,
32.422321
],
[
-84.960165,
32.422323
],
[
-84.960164,
32.422364
],
[
-84.96025,
32.422365
],
[
-84.960253,
32.422248
]
]
]
},
"properties": {
"release": 2,
"capture_dates_range": "3/26/2020-7/22/2020"
}
},
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-84.961602,
32.419206
],
[
-84.961599,
32.419354
],
[
-84.961707,
32.419355
],
[
-84.961708,
32.419291
],
[
-84.961794,
32.419292
],
[
-84.961796,
32.419208
],
[
-84.961602,
32.419206
]
]
]
},
"properties": {
"release": 2,
"capture_dates_range": "3/26/2020-7/22/2020"
}
}
]
}
Your original filter can be simplified as follows:
.features
remains as-ismap(.geometry.coordinates) | map(.[]) | map(first)
is actually just map(.geometry.coordinates | .[] | first)
map(…) | .[]
can then be reduced to .[] | …
{"long": first, "lat": last} | [.long, .lat]
builds an object, and immediately turns it into an array. This can be simplified to [first,last]
, but as the array only had two items to begin with, this part actually just returns the input, and can be dropped entirely@csv
remains as-isIn summary, with further contractions like .[] | first
to .[][0]
, you can achieve the same using
.features[].geometry.coordinates[][0] | @csv
This goes five levels deep, then picks the first item, and turns it into a CSV output. So, this can be translated to a --stream
version as folloews:
fromstream(5|truncate_stream(inputs))[0] | @csv
-114.127454,34.265674
-114.127694,34.260939
-114.127988,34.264977
-114.129007,34.260229
-114.129611,34.261105
-114.130311,34.263922
-114.131834,34.284069
-114.132183,34.28509
-114.132634,34.281492
-114.133764,34.282816
: